2014-02-05 20:03:11

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 000/213] v2.6.34.15 longterm review

This is the start of the longterm review cycle for the v2.6.34.15 release.
There are 213 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let us know. If anyone is a maintainer of the proper subsystem, and
wants to add a Signed-off-by: line to the patch, please respond with it.

The full queue can be found at:
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git

Please try to get reponses made within 72 hours, or it may be too late.

This will be the last release on 2.6.34.x ; people should be making
migration plans to newer kernels. As such, the focus here has been
with CVE items, data leaks, and bugs that could trigger BUG/oops.

Thanks to Greg (3.x) and Willy (2.6.32) for implicitly assisting in the
choices for commits to be backport candidates for 2.6.34.

Build tested for full bisection, boot tested on i386, x86-64, and powerpc.

Thanks,
Paul.
---

Al Viro (1):
vfs: missed source of ->f_pos races

Alan Cox (1):
x86/msr: Add capabilities check

Alan Stern (1):
USB: EHCI: go back to using the system clock for QH unlinks

Alex He (1):
xHCI: Correct the #define XHCI_LEGACY_DISABLE_SMI

Alex Williamson (2):
KVM: unmap pages from the iommu when slots are removed
KVM: lock slots_lock around device assignment

Alexey Khoroshilov (1):
net/core: Fix potential memory leak in dev_set_alias()

Allison Henderson (1):
ext4: don't dereference null pointer when make_indexed_dir() fails

Anatol Pomozov (1):
ext4: make orphan functions be no-op in no-journal mode

Anderson Lizardo (1):
Bluetooth: Fix incorrect strncpy() in hidp_setup_hid()

Andi Kleen (2):
MCE: Fix vm86 handling for 32bit mce handler
Fix install_process_keyring error handling

Andrew Worsley (1):
USB: serial: ftdi_sio: Handle the old_termios == 0 case e.g.
uart_resume_port()

Andy Honig (2):
KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME
(CVE-2013-1796)
KVM: Fix bounds checking in ioapic indirect register reads
(CVE-2013-1798)

Andy Lutomirski (1):
mm: Hold a file reference in madvise_remove

Anurup m (1):
fs/fscache/stats.c: fix memory leak

Bart Van Assche (1):
Avoid dangling pointer in scsi_requeue_command()

Bernd Schubert (1):
ext4: always set i_op in ext4_mknod()

Bjorn Helgaas (1):
Driver core: treat unregistered bus_types as having no devices

Bjørn Mork (1):
USB: cdc-wdm: fix lockup on error in wdm_read

Brian Foster (1):
ext4: don't let i_reserved_meta_blocks go negative

Chen Gang (1):
drivers/char/ipmi: memcpy, need additional 2 bytes to avoid memory
overflow

Chris Mason (1):
Btrfs: call the ordered free operation without any locks held

Christoffer Dall (1):
mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED

Colin Ian King (1):
USB: echi-dbgp: increase the controller wait time to come out of halt.

Cong Ding (1):
fs/cifs/cifs_dfs_ref.c: fix potential memory leakage

Cong Wang (1):
net: prevent setting ttl=0 via IP_TTL

Dan Carpenter (3):
x86, tls: Off by one limit check
USB: kaweth.c: use GFP_ATOMIC under spin_lock
mtd: cafe_nand: fix an & vs | mistake

Dan Williams (3):
libsas: continue revalidation
SCSI: libsas: fix sas_discover_devices return code handling
fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)

Daniel Borkmann (3):
net: sctp: sctp_auth_key_put: use kzfree instead of kfree
net: sctp: sctp_endpoint_free: zero out secret key data
net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree

Daniel J Blueman (1):
Prevent interface errors with Seagate FreeAgent GoFlex

Darren Hart (3):
futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()
futex: Fix bug in WARN_ON for NULL q.pi_state
futex: Test for pi_mutex on fault in futex_wait_requeue_pi()

Dave Hansen (2):
hugetlb: fix resv_map leak in error path
mm: fix vma_resv_map() NULL pointer

Dave Jones (1):
Remove user-triggerable BUG from mpol_to_str

David Howells (1):
keys: fix race with concurrent install_user_keyrings()

David S. Miller (1):
tun: Fix formatting.

David Ward (1):
net_sched: gred: Fix oops in gred_dump() in WRED mode

Denys Vlasenko (1):
coredump: prevent double-free on an error path in core dumper

Dmitry Monakhov (1):
ext4: online defrag is not supported for journaled files

Eddie Wai (1):
bnx2i: Fixed NULL ptr deference for 1G bnx2 Linux iSCSI offload

Emese Revfy (1):
kernel/signal.c: stop info leak via the tkill and the tgkill syscalls

Eric Dumazet (13):
inet: add RCU protection to inet->opt
tcp: allow splice() to build full TSO packets
tcp: tcp_sendpages() should call tcp_push() once
tcp: fix MSG_SENDPAGE_NOTLAST logic
tcp: preserve ACK clocking in TSO
net: guard tcp_set_keepalive() to tcp sockets
netem: fix possible skb leak
net: fix a race in sock_queue_err_skb()
netlink: fix races after skb queueing
softirq: reduce latencies
net: reduce net_rx_action() latency to 2 HZ
tcp: drop SYN+FIN messages
drop_monitor: dont sleep in atomic context

Eric Paris (1):
inotify: fix double free/corruption of stuct user

Eric Sandeen (2):
jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer
btrfs: use rcu_barrier() to wait for bdev puts at unmount

Eric Wong (1):
epoll: prevent missed events on EPOLL_CTL_MOD

Eryu Guan (1):
jbd/jbd2: validate sb->s_first in journal_get_superblock()

Eugene Shatokhin (1):
ext4: fix memory leak in ext4_xattr_set_acl()'s error path

Geert Uytterhoeven (1):
sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat()

George G. Davis (1):
udf: fix udf_error build warnings

Greg Pearson (1):
pcdp: use early_ioremap/early_iounmap to access pcdp table

Greg Thelen (1):
tmpfs: fix use-after-free of mempolicy object

Hannes Frederic Sowa (1):
ipv6: call udp_push_pending_frames when uncorking a socket with
AF_INET pending data

Hans de Goede (1):
usbdevfs: Correct amount of data copied to user in processcompl_compat

Herbert Xu (1):
bridge: Fix mglist corruption that leads to memory corruption

Hiroaki SHIMODA (1):
net_sched: gact: Fix potential panic in tcf_gact().

Hugh Dickins (1):
mm: fix invalidate_complete_page2() lock ordering

J. Bruce Fields (3):
nfsd4: fix oops on unusual readlike compound
svcrpc: sends on closed socket should stop immediately
svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping

James Bottomley (1):
fix crash in scsi_dispatch_cmd()

Jan Beulich (1):
x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.

Jan Kara (9):
jbd: Fix assertion failure in commit code due to lacking transaction
credits
jbd: Fix lock ordering bug in journal_unmap_buffer()
ext3: Fix fdatasync() for files with only i_size changes
ext3: Fix error handling on inode bitmap corruption
ext4: Fix fs corruption when make_indexed_dir() fails
ext4: fix fdatasync() for files with only i_size changes
ext4: fix error handling on inode bitmap corruption
udf: Fix bitmap overflow on large filesystems with small block size
udf: Fix data corruption for files in ICB

Jesper Dangaard Brouer (1):
net: fix divide by zero in tcp algorithm illinois

Jiri Kosina (1):
tcp: perform DMA to userspace only if there is a task waiting for it

Jiri Slaby (1):
serial: 8250, increase PASS_LIMIT

Johan Hovold (6):
Bluetooth: hci_ldisc: fix NULL-pointer dereference on tty_close
USB: whiteheat: fix memory leak in error path
USB: mos7840: fix urb leak at release
USB: mos7840: fix port-device leak in error path
USB: garmin_gps: fix memory leak on disconnect
USB: serial: fix race between probe and open

Jonathan Nieder (1):
NFSv4: Revalidate uid/gid after open

Jozsef Kadlecsik (1):
netfilter: nf_ct_ipv4: packets with wrong ihl are invalid

Jun Nie (1):
Bluetooth: add NULL pointer check in HCI

Jussi Kivilinna (1):
crypto: cryptd - disable softirqs in cryptd_queue_worker to prevent
data corruption

Kees Cook (9):
HID: validate HID report id size
HID: pantherlord: validate output report details
HID: provide a helper for validating hid reports
HID: zeroplus: validate output report details
HID: LG: validate HID output report details
gen_init_cpio: avoid stack overflow when expanding
exec: do not leave bprm->interp on stack
exec: use -ELOOP for max recursion depth
fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check

Kent Yoder (1):
crypto: sha512 - Fix byte counter overflow in SHA-512

Konrad Rzeszutek Wilk (3):
xen/bootup: allow read_tscp call for Xen PV guests.
xen/bootup: allow {read|write}_cr8 pvops call.
ACPI / cpuidle: Fix NULL pointer issues when cpuidle is disabled

Lachlan McIlroy (1):
ext4: limit group search loop for non-extent files

Larry Finger (1):
b43legacy: Fix crash on unload when firmware not available

Lennart Sorensen (1):
USB: serial: Fix memory leak in sierra_release()

Li Zhong (1):
Fix a dead loop in async_synchronize_full()

Marcin Jurkowski (1):
w1: fix oops when w1_search is called from netlink connector

Mark Ferrell (1):
usb: serial: mos7840: Fixup mos7840_chars_in_buffer()

Mark Rutland (1):
clockevents: Don't allow dummy broadcast timers

Mathias Krause (24):
rose: fix info leak via msg_name in rose_recvmsg()
llc: fix info leak via getsockname()
llc: Fix missing msg_namelen update in llc_ui_recvmsg()
iucv: Fix missing msg_namelen update in iucv_sock_recvmsg()
ax25: fix info leak via msg_name in ax25_recvmsg()
atm: fix info leak in getsockopt(SO_ATMPVC)
atm: fix info leak via getsockname()
atm: update msg_namelen in vcc_recvmsg()
ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT)
net: fix info leak in compat dev_ifconf()
net/tun: fix ioctl() based info leaks
xfrm_user: fix info leak in copy_to_user_state()
xfrm_user: fix info leak in copy_to_user_policy()
xfrm_user: fix info leak in copy_to_user_tmpl()
xfrm_user: return error pointer instead of NULL
xfrm_user: return error pointer instead of NULL #2
Bluetooth: HCI - Fix info leak in getsockopt(HCI_FILTER)
Bluetooth: RFCOMM - Fix info leak via getsockname()
Bluetooth: RFCOMM - Fix missing msg_namelen update in
rfcomm_sock_recvmsg()
Bluetooth: L2CAP - Fix info leak via getsockname()
Bluetooth: fix possible info leak in bt_sock_recvmsg()
isofs: avoid info leak on export
udf: avoid info leak on export
dccp: check ccid before dereferencing

Matthew Garrett (1):
xhci: Make handover code more robust

Mel Gorman (2):
mempolicy: fix a race in shared_policy_replace()
x86/mm: Check if PUD is large when validating a kernel address

Namhyung Kim (1):
tracing: Fix double free when function profile init failed

Namjae Jeon (1):
udf: fix memory leak while allocating blocks during write

Neil Horman (4):
crypto: ansi_cprng - Fix off by one error in non-block size request
drop_monitor: fix sleeping in invalid context warning
drop_monitor: Make updating data->skb smp safe
drop_monitor: prevent init path from scheduling on the wrong cpu

Nick Bowler (1):
crypto: ghash - Avoid null pointer dereference if no key is set

Nikola Pajkovsky (1):
udf: fix retun value on error path in udf_load_logicalvol

Nithin Nayak Sujir (1):
tg3: Avoid null pointer dereference in tg3_interrupt in netconsole
mode

Niu Yawei (1):
ext4: fix race in ext4_mb_add_n_trim()

Oleg Nesterov (2):
ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread
wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED
task

Oliver Neukum (1):
USB: cdc-wdm: fix buffer overflow

Paolo Bonzini (3):
block: add and use scsi_blk_cmd_ioctl
block: fail SCSI passthrough ioctls on partition devices
dm: do not forward ioctls from logical volumes to the underlying
device

Patrick McHardy (1):
IPoIB: Fix use-after-free of multicast object

Paul Gortmaker (1):
Revert "percpu: fix chunk range calculation"

Paul Moore (2):
unix: fix a race condition in unix_release()
cipso: don't follow a NULL pointer when setsockopt() is called

Pavel Shilovsky (1):
fuse: fix stat call on 32 bit platforms

Roberto Sassu (1):
ecryptfs: call vfs_setxattr() in ecryptfs_setxattr()

Romain Francoise (1):
x86, random: make ARCH_RANDOM prompt if EMBEDDED, not EXPERT

Samu Kallio (1):
x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates

Sarah Sharp (3):
xhci: Increase reset timeout for Renesas 720201 host.
xhci: Reset reserved command ring TRBs on cleanup.
xhci: Don't write zeroed pointers to xHC registers.

Sasha Levin (1):
phonet: Check input from user before allocating

Shawn Guo (1):
kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()

Stanislaw Gruszka (1):
posix-cpu-timers: Fix nanosleep task_struct leak

Stefan Hasko (1):
net: sched: integer overflow fix

Stephen Hemminger (1):
bridge: set priority of STP packets

Sven Schnelle (1):
USB: CDC ACM: Fix NULL pointer dereference

T Makphaibulchoke (1):
kernel/resource.c: fix stack overflow in __reserve_region_with_split()

Takamori Yamaguchi (1):
mm: bugfix: set current->reclaim_state to NULL while returning from
kswapd()

Takashi Iwai (1):
ALSA: seq: Fix missing error handling in snd_seq_timer_open()

Tejun Heo (1):
cgroup: remove incorrect dget/dput() pair in cgroup_create_dir()

Theodore Ts'o (2):
ext4: lock i_mutex when truncating orphan inodes
ext4: avoid hang when mounting non-journal filesystems with orphan
list

Thomas Gleixner (1):
tick: Cleanup NOHZ per cpu data on cpu down

Thomas Jarosch (1):
PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs

Tirupathi Reddy (1):
timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE

Tommi Rantala (1):
sctp: fix memory leak in sctp_datamsg_from_user() when copy from user
space fails

Trond Myklebust (2):
NFSv3: Ensure that do_proc_get_root() reports errors correctly
kernel panic when mount NFSv4

Tyler Hicks (3):
libceph: Fix NULL pointer dereference in auth client code
eCryptfs: Copy up lower inode attrs after setting lower xattr
eCryptfs: Properly check for O_RDONLY flag before doing privileged
open

Vyacheslav Dubeyko (1):
hfsplus: fix potential overflow in hfsplus_file_truncate()

Wang YanQing (1):
video:uvesafb: Fix oops that uvesafb try to execute NX-protected page

Weiping Pan (1):
rds: set correct msg_namelen

Wen Congyang (1):
tracing: Don't call page_to_pfn() if page is NULL

Willy Tarreau (1):
tcp: do_tcp_sendpages() must try to push data out on oom conditions

Wolfgang Frisch (1):
USB: io_ti: Fix NULL dereference in chase_port()

Wu Fengguang (1):
isdnloop: fix and simplify isdnloop_init()

Xiao Guangrong (1):
mm: mmu_notifier: fix freed page still mapped in secondary MMU

Xiaotian Feng (1):
fix Null pointer dereference on disk error

Zach Brown (1):
fuse: verify all ioctl retry iov elements

[email protected] (1):
af_packet: remove BUG statement in tpacket_destruct_skb

stephen hemminger (1):
netlink: wake up netlink listeners sooner (v2)

arch/x86/Kconfig | 2 +-
arch/x86/include/asm/pgtable.h | 5 ++
arch/x86/kernel/cpu/mcheck/mce.c | 9 +-
arch/x86/kernel/msr.c | 3 +
arch/x86/kernel/tls.c | 4 +-
arch/x86/kvm/x86.c | 5 ++
arch/x86/mm/fault.c | 6 +-
arch/x86/mm/init_64.c | 3 +
arch/x86/xen/enlighten.c | 18 +++-
arch/x86/xen/xen-asm_32.S | 14 +--
block/blk-core.c | 3 +
block/blk-exec.c | 7 ++
block/scsi_ioctl.c | 52 +++++++++++
crypto/ansi_cprng.c | 4 +-
crypto/cryptd.c | 11 ++-
crypto/ghash-generic.c | 6 ++
crypto/sha512_generic.c | 2 +-
drivers/acpi/processor_idle.c | 3 +
drivers/ata/libata-core.c | 1 +
drivers/ata/libata-scsi.c | 6 +-
drivers/base/bus.c | 4 +-
drivers/block/cciss.c | 6 +-
drivers/block/ub.c | 3 +-
drivers/block/virtio_blk.c | 4 +-
drivers/bluetooth/hci_ldisc.c | 6 +-
drivers/cdrom/cdrom.c | 3 +-
drivers/char/ipmi/ipmi_bt_sm.c | 4 +-
drivers/firmware/pcdp.c | 4 +-
drivers/hid/hid-core.c | 68 +++++++++++++-
drivers/hid/hid-lg2ff.c | 19 +---
drivers/hid/hid-lg3ff.c | 29 ++----
drivers/hid/hid-lgff.c | 17 +---
drivers/hid/hid-pl.c | 10 ++-
drivers/hid/hid-zpff.c | 18 ++--
drivers/ide/ide-floppy_ioctl.c | 3 +-
drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 19 ++--
drivers/isdn/isdnloop/isdnloop.c | 12 ---
drivers/md/dm-linear.c | 12 ++-
drivers/md/dm-mpath.c | 6 ++
drivers/mtd/nand/cafe_nand.c | 2 +-
drivers/net/tg3.c | 4 +
drivers/net/tun.c | 6 +-
drivers/net/usb/kaweth.c | 2 +-
drivers/net/wireless/b43legacy/main.c | 2 +
drivers/pci/quirks.c | 34 +++++++
drivers/scsi/bnx2i/bnx2i_hwi.c | 3 +
drivers/scsi/libsas/sas_expander.c | 47 ++++------
drivers/scsi/scsi_error.c | 14 +++
drivers/scsi/scsi_lib.c | 13 +++
drivers/scsi/sd.c | 13 ++-
drivers/serial/8250.c | 2 +-
drivers/usb/class/cdc-acm.c | 3 +-
drivers/usb/class/cdc-wdm.c | 25 +++++-
drivers/usb/core/devio.c | 10 ++-
drivers/usb/early/ehci-dbgp.c | 2 +-
drivers/usb/host/ehci-hcd.c | 8 +-
drivers/usb/host/ehci-q.c | 82 +++++++++--------
drivers/usb/host/ehci.h | 3 +-
drivers/usb/host/pci-quirks.c | 22 +++--
drivers/usb/host/xhci-ext-caps.h | 5 +-
drivers/usb/host/xhci-mem.c | 10 +--
drivers/usb/host/xhci.c | 5 +-
drivers/usb/serial/ftdi_sio.c | 4 +
drivers/usb/serial/garmin_gps.c | 7 +-
drivers/usb/serial/io_ti.c | 3 +
drivers/usb/serial/mos7840.c | 11 ++-
drivers/usb/serial/sierra.c | 1 +
drivers/usb/serial/usb-serial.c | 8 ++
drivers/usb/serial/whiteheat.c | 1 +
drivers/video/uvesafb.c | 11 ++-
drivers/w1/w1.c | 3 +-
fs/binfmt_elf.c | 19 +---
fs/binfmt_em86.c | 1 -
fs/binfmt_misc.c | 11 +--
fs/binfmt_script.c | 8 +-
fs/btrfs/async-thread.c | 9 +-
fs/btrfs/volumes.c | 6 ++
fs/ceph/auth_none.c | 6 ++
fs/cifs/cifs_dfs_ref.c | 2 +
fs/compat.c | 10 ++-
fs/compat_ioctl.c | 2 +
fs/ecryptfs/inode.c | 9 +-
fs/ecryptfs/kthread.c | 2 +-
fs/eventpoll.c | 22 ++++-
fs/exec.c | 25 ++++--
fs/ext3/ialloc.c | 8 +-
fs/ext3/inode.c | 17 +++-
fs/ext4/acl.c | 6 +-
fs/ext4/ialloc.c | 8 +-
fs/ext4/inode.c | 17 +++-
fs/ext4/mballoc.c | 12 ++-
fs/ext4/move_extent.c | 7 +-
fs/ext4/namei.c | 26 ++++--
fs/ext4/super.c | 2 +
fs/fscache/stats.c | 2 +-
fs/fuse/dir.c | 1 +
fs/fuse/file.c | 2 +-
fs/fuse/fuse_i.h | 3 +
fs/fuse/inode.c | 17 +++-
fs/hfsplus/extents.c | 2 +-
fs/isofs/export.c | 1 +
fs/jbd/commit.c | 45 +++++++---
fs/jbd/journal.c | 8 ++
fs/jbd/transaction.c | 66 +++++++++-----
fs/jbd2/journal.c | 8 ++
fs/jbd2/transaction.c | 2 +
fs/nfs/nfs3proc.c | 2 +-
fs/nfs/nfs4proc.c | 1 +
fs/nfsd/nfs4xdr.c | 11 ++-
fs/notify/inotify/inotify_fsnotify.c | 1 +
fs/notify/inotify/inotify_user.c | 39 +++-----
fs/splice.c | 7 +-
fs/sysfs/dir.c | 16 ++--
fs/udf/file.c | 35 ++++++--
fs/udf/inode.c | 4 +
fs/udf/namei.c | 1 +
fs/udf/super.c | 14 ++-
fs/udf/udf_sb.h | 2 +-
include/linux/binfmts.h | 3 +-
include/linux/blkdev.h | 3 +
include/linux/hid.h | 8 +-
include/linux/kvm_host.h | 6 ++
include/linux/mempolicy.h | 2 +-
include/linux/page-flags.h | 8 +-
include/linux/socket.h | 2 +-
include/net/inet_sock.h | 14 ++-
include/net/ip.h | 11 +--
include/net/sock.h | 4 +-
include/net/udp.h | 1 +
include/trace/events/kmem.h | 4 +-
kernel/async.c | 13 ++-
kernel/cgroup.c | 2 -
kernel/futex.c | 17 ++--
kernel/posix-cpu-timers.c | 23 ++++-
kernel/ptrace.c | 2 +-
kernel/resource.c | 50 ++++++++---
kernel/sched.c | 3 +-
kernel/signal.c | 2 +-
kernel/softirq.c | 17 ++--
kernel/sys.c | 1 +
kernel/time/tick-broadcast.c | 3 +-
kernel/time/tick-sched.c | 2 +-
kernel/timer.c | 2 +-
kernel/trace/ftrace.c | 1 -
mm/hugetlb.c | 29 ++++--
mm/madvise.c | 16 +++-
mm/mempolicy.c | 39 ++++----
mm/mmu_notifier.c | 45 +++++-----
mm/percpu.c | 46 +++++-----
mm/shmem.c | 10 ++-
mm/truncate.c | 3 +-
mm/vmscan.c | 2 +
net/atm/common.c | 3 +
net/atm/pvc.c | 1 +
net/ax25/af_ax25.c | 1 +
net/bluetooth/af_bluetooth.c | 4 +-
net/bluetooth/hci_sock.c | 1 +
net/bluetooth/hidp/core.c | 2 +-
net/bluetooth/l2cap.c | 1 +
net/bluetooth/rfcomm/sock.c | 2 +
net/bridge/br_multicast.c | 3 +-
net/bridge/br_stp_bpdu.c | 2 +
net/core/dev.c | 9 +-
net/core/drop_monitor.c | 113 ++++++++++++-----------
net/core/sock.c | 3 +-
net/dccp/ccid.h | 4 +-
net/dccp/ipv4.c | 15 ++--
net/dccp/ipv6.c | 2 +-
net/ipv4/af_inet.c | 16 ++--
net/ipv4/cipso_ipv4.c | 119 ++++++++++++++-----------
net/ipv4/icmp.c | 23 +++--
net/ipv4/inet_connection_sock.c | 8 +-
net/ipv4/ip_options.c | 38 ++++----
net/ipv4/ip_output.c | 50 ++++++-----
net/ipv4/ip_sockglue.c | 35 +++++---
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 8 ++
net/ipv4/raw.c | 19 +++-
net/ipv4/syncookies.c | 4 +-
net/ipv4/tcp.c | 5 +-
net/ipv4/tcp_illinois.c | 8 +-
net/ipv4/tcp_input.c | 6 +-
net/ipv4/tcp_ipv4.c | 33 ++++---
net/ipv4/tcp_output.c | 7 +-
net/ipv4/udp.c | 24 +++--
net/ipv6/tcp_ipv6.c | 2 +-
net/ipv6/udp.c | 7 +-
net/iucv/af_iucv.c | 2 +
net/llc/af_llc.c | 5 +-
net/netfilter/ipvs/ip_vs_ctl.c | 1 +
net/netlink/af_netlink.c | 26 +++---
net/packet/af_packet.c | 1 -
net/phonet/pep.c | 3 +
net/rds/recv.c | 3 +
net/rose/af_rose.c | 1 +
net/sched/act_gact.c | 14 ++-
net/sched/sch_gred.c | 7 +-
net/sched/sch_htb.c | 2 +-
net/sched/sch_netem.c | 6 +-
net/sctp/auth.c | 2 +-
net/sctp/chunk.c | 7 +-
net/sctp/endpointola.c | 5 ++
net/sctp/socket.c | 2 +-
net/socket.c | 7 +-
net/sunrpc/rpc_pipe.c | 2 +-
net/sunrpc/svc_xprt.c | 10 +--
net/unix/af_unix.c | 7 +-
net/xfrm/xfrm_user.c | 15 +++-
security/keys/process_keys.c | 4 +-
sound/core/seq/seq_timer.c | 8 +-
usr/gen_init_cpio.c | 43 ++++-----
virt/kvm/ioapic.c | 7 +-
virt/kvm/iommu.c | 28 ++++--
virt/kvm/kvm_main.c | 5 +-
214 files changed, 1624 insertions(+), 875 deletions(-)

--
1.8.5.2


2014-02-05 20:03:37

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 013/213] libceph: Fix NULL pointer dereference in auth client code

From: Tyler Hicks <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 2cb33cac622afde897aa02d3dcd9fbba8bae839e upstream.

A malicious monitor can craft an auth reply message that could cause a
NULL function pointer dereference in the client's kernel.

To prevent this, the auth_none protocol handler needs an empty
ceph_auth_client_ops->build_request() function.

CVE-2013-1059

Signed-off-by: Tyler Hicks <[email protected]>
Reported-by: Chanam Park <[email protected]>
Reviewed-by: Seth Arnold <[email protected]>
Reviewed-by: Sage Weil <[email protected]>
[PG: in v2.6.34, file is fs/ceph and not net/ceph]
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ceph/auth_none.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/fs/ceph/auth_none.c b/fs/ceph/auth_none.c
index 8cd9e3af07f7..1d1f9b4cbd87 100644
--- a/fs/ceph/auth_none.c
+++ b/fs/ceph/auth_none.c
@@ -31,6 +31,11 @@ static int is_authenticated(struct ceph_auth_client *ac)
return !xi->starting;
}

+static int build_request(struct ceph_auth_client *ac, void *buf, void *end)
+{
+ return 0;
+}
+
/*
* the generic auth code decode the global_id, and we carry no actual
* authenticate state, so nothing happens here.
@@ -97,6 +102,7 @@ static const struct ceph_auth_client_ops ceph_auth_none_ops = {
.reset = reset,
.destroy = destroy,
.is_authenticated = is_authenticated,
+ .build_request = build_request,
.handle_reply = handle_reply,
.create_authorizer = ceph_auth_none_create_authorizer,
.destroy_authorizer = ceph_auth_none_destroy_authorizer,
--
1.8.5.2

2014-02-05 20:03:48

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 017/213] HID: validate HID report id size

From: Kees Cook <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 43622021d2e2b82ea03d883926605bdd0525e1d1 upstream.

The "Report ID" field of a HID report is used to build indexes of
reports. The kernel's index of these is limited to 256 entries, so any
malicious device that sets a Report ID greater than 255 will trigger
memory corruption on the host:

[ 1347.156239] BUG: unable to handle kernel paging request at ffff88094958a878
[ 1347.156261] IP: [<ffffffff813e4da0>] hid_register_report+0x2a/0x8b

CVE-2013-2888

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
[PG: hid_err() --> dbg_hid() in 2.6.34 baseline]

Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/hid/hid-core.c | 10 +++++++---
include/linux/hid.h | 4 +++-
2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index 07ddda553a95..195e366ea18d 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -56,6 +56,8 @@ struct hid_report *hid_register_report(struct hid_device *device, unsigned type,
struct hid_report_enum *report_enum = device->report_enum + type;
struct hid_report *report;

+ if (id >= HID_MAX_IDS)
+ return NULL;
if (report_enum->report_id_hash[id])
return report_enum->report_id_hash[id];

@@ -367,8 +369,10 @@ static int hid_parser_global(struct hid_parser *parser, struct hid_item *item)

case HID_GLOBAL_ITEM_TAG_REPORT_ID:
parser->global.report_id = item_udata(item);
- if (parser->global.report_id == 0) {
- dbg_hid("report_id 0 is invalid\n");
+ if (parser->global.report_id == 0 ||
+ parser->global.report_id >= HID_MAX_IDS) {
+ dbg_hid("report_id %u is invalid\n",
+ parser->global.report_id);
return -1;
}
return 0;
@@ -545,7 +549,7 @@ static void hid_device_release(struct device *dev)
for (i = 0; i < HID_REPORT_TYPES; i++) {
struct hid_report_enum *report_enum = device->report_enum + i;

- for (j = 0; j < 256; j++) {
+ for (j = 0; j < HID_MAX_IDS; j++) {
struct hid_report *report = report_enum->report_id_hash[j];
if (report)
hid_free_report(report);
diff --git a/include/linux/hid.h b/include/linux/hid.h
index b1344ec4b7fc..85e0942cfd76 100644
--- a/include/linux/hid.h
+++ b/include/linux/hid.h
@@ -410,10 +410,12 @@ struct hid_report {
struct hid_device *device; /* associated device */
};

+#define HID_MAX_IDS 256
+
struct hid_report_enum {
unsigned numbered;
struct list_head report_list;
- struct hid_report *report_id_hash[256];
+ struct hid_report *report_id_hash[HID_MAX_IDS];
};

#define HID_REPORT_TYPES 3
--
1.8.5.2

2014-02-05 20:04:13

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 035/213] net: sctp: sctp_endpoint_free: zero out secret key data

From: Daniel Borkmann <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit b5c37fe6e24eec194bb29d22fdd55d73bcc709bf upstream.

On sctp_endpoint_destroy, previously used sensitive keying material
should be zeroed out before the memory is returned, as we already do
with e.g. auth keys when released.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/sctp/endpointola.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 7ec09ba03a1c..e80ba5def747 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -250,6 +250,8 @@ void sctp_endpoint_free(struct sctp_endpoint *ep)
/* Final destructor for endpoint. */
static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
{
+ int i;
+
SCTP_ASSERT(ep->base.dead, "Endpoint is not dead", return);

/* Free up the HMAC transform. */
@@ -272,6 +274,9 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
sctp_inq_free(&ep->base.inqueue);
sctp_bind_addr_free(&ep->base.bind_addr);

+ for (i = 0; i < SCTP_HOW_MANY_SECRETS; ++i)
+ memset(&ep->secret_key[i], 0, SCTP_SECRET_SIZE);
+
/* Remove and free the port */
if (sctp_sk(ep->base.sk)->bind_hash)
sctp_put_port(ep->base.sk);
--
1.8.5.2

2014-02-05 20:04:20

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 032/213] ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT)

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 2d8a041b7bfe1097af21441cb77d6af95f4f4680 upstream.

If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Wensong Zhang <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Julian Anastasov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/netfilter/ipvs/ip_vs_ctl.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 36dc1d88c2fa..bd9d805a85a6 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2469,6 +2469,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
{
struct ip_vs_timeout_user t;

+ memset(&t, 0, sizeof(t));
__ip_vs_get_timeouts(&t);
if (copy_to_user(user, &t, sizeof(t)) != 0)
ret = -EFAULT;
--
1.8.5.2

2014-02-05 20:04:27

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 033/213] netfilter: nf_ct_ipv4: packets with wrong ihl are invalid

From: Jozsef Kadlecsik <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 07153c6ec074257ade76a461429b567cff2b3a1e upstream.

It was reported that the Linux kernel sometimes logs:

klogd: [2629147.402413] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 447!
klogd: [1072212.887368] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 392

ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
at the indicated lines - TCP options parsing - should not happen.
However, tcp_error() relies on the "dataoff" offset to the TCP header,
calculated by ipv4_get_l4proto(). But ipv4_get_l4proto() does not check
bogus ihl values in IPv4 packets, which then can slip through tcp_error()
and get caught at the TCP options parsing routines.

The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
ihl value.

The patch closes netfilter bugzilla id 771.

Signed-off-by: Jozsef Kadlecsik <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 2bb1f87051c4..a0af7a2d6117 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -84,6 +84,14 @@ static int ipv4_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
*dataoff = nhoff + (iph->ihl << 2);
*protonum = iph->protocol;

+ /* Check bogus IP headers */
+ if (*dataoff > skb->len) {
+ pr_debug("nf_conntrack_ipv4: bogus IPv4 packet: "
+ "nhoff %u, ihl %u, skblen %u\n",
+ nhoff, iph->ihl << 2, skb->len);
+ return -NF_ACCEPT;
+ }
+
return NF_ACCEPT;
}

--
1.8.5.2

2014-02-05 20:04:37

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 010/213] block: add and use scsi_blk_cmd_ioctl

From: Paolo Bonzini <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 577ebb374c78314ac4617242f509e2f5e7156649 upstream.

Introduce a wrapper around scsi_cmd_ioctl that takes a block device.

The function will then be enhanced to detect partition block devices
and, in that case, subject the ioctls to whitelisting.

Cc: [email protected]
Cc: Jens Axboe <[email protected]>
Cc: James Bottomley <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
block/scsi_ioctl.c | 7 +++++++
drivers/block/cciss.c | 6 +++---
drivers/block/ub.c | 3 +--
drivers/block/virtio_blk.c | 4 ++--
drivers/cdrom/cdrom.c | 3 +--
drivers/ide/ide-floppy_ioctl.c | 3 +--
drivers/scsi/sd.c | 2 +-
include/linux/blkdev.h | 2 ++
8 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 4f4230b79bb6..57ac93754841 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -691,6 +691,13 @@ int scsi_cmd_ioctl(struct request_queue *q, struct gendisk *bd_disk, fmode_t mod
}
EXPORT_SYMBOL(scsi_cmd_ioctl);

+int scsi_cmd_blk_ioctl(struct block_device *bd, fmode_t mode,
+ unsigned int cmd, void __user *arg)
+{
+ return scsi_cmd_ioctl(bd->bd_disk->queue, bd->bd_disk, mode, cmd, arg);
+}
+EXPORT_SYMBOL(scsi_cmd_blk_ioctl);
+
static int __init blk_scsi_ioctl_init(void)
{
blk_set_cmd_filter_defaults(&blk_default_cmd_filter);
diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index eb5ff0531cfb..54bad7584ea4 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -1652,7 +1652,7 @@ static int cciss_ioctl(struct block_device *bdev, fmode_t mode,
return status;
}

- /* scsi_cmd_ioctl handles these, below, though some are not */
+ /* scsi_cmd_blk_ioctl handles these, below, though some are not */
/* very meaningful for cciss. SG_IO is the main one people want. */

case SG_GET_VERSION_NUM:
@@ -1663,9 +1663,9 @@ static int cciss_ioctl(struct block_device *bdev, fmode_t mode,
case SG_EMULATED_HOST:
case SG_IO:
case SCSI_IOCTL_SEND_COMMAND:
- return scsi_cmd_ioctl(disk->queue, disk, mode, cmd, argp);
+ return scsi_cmd_blk_ioctl(bdev, mode, cmd, argp);

- /* scsi_cmd_ioctl would normally handle these, below, but */
+ /* scsi_cmd_blk_ioctl would normally handle these, below, but */
/* they aren't a good fit for cciss, as CD-ROMs are */
/* not supported, and we don't have any bus/target/lun */
/* which we present to the kernel. */
diff --git a/drivers/block/ub.c b/drivers/block/ub.c
index 0536b5b29adc..1c1533a59c4d 100644
--- a/drivers/block/ub.c
+++ b/drivers/block/ub.c
@@ -1727,10 +1727,9 @@ static int ub_bd_release(struct gendisk *disk, fmode_t mode)
static int ub_bd_ioctl(struct block_device *bdev, fmode_t mode,
unsigned int cmd, unsigned long arg)
{
- struct gendisk *disk = bdev->bd_disk;
void __user *usermem = (void __user *) arg;

- return scsi_cmd_ioctl(disk->queue, disk, mode, cmd, usermem);
+ return scsi_cmd_blk_ioctl(bdev, mode, cmd, usermem);
}

/*
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2138a7ae050c..4abfa80fdcd6 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -201,8 +201,8 @@ static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
if (!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_SCSI))
return -ENOTTY;

- return scsi_cmd_ioctl(disk->queue, disk, mode, cmd,
- (void __user *)data);
+ return scsi_cmd_blk_ioctl(bdev, mode, cmd,
+ (void __user *)data);
}

/* We provide getgeo only to please some old bootloader/partitioning tools */
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index e3749d0ba68b..5e7c72d3fe39 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2684,12 +2684,11 @@ int cdrom_ioctl(struct cdrom_device_info *cdi, struct block_device *bdev,
{
void __user *argp = (void __user *)arg;
int ret;
- struct gendisk *disk = bdev->bd_disk;

/*
* Try the generic SCSI command ioctl's first.
*/
- ret = scsi_cmd_ioctl(disk->queue, disk, mode, cmd, argp);
+ ret = scsi_cmd_blk_ioctl(bdev, mode, cmd, argp);
if (ret != -ENOTTY)
return ret;

diff --git a/drivers/ide/ide-floppy_ioctl.c b/drivers/ide/ide-floppy_ioctl.c
index 9c2288234dea..05f024caf4c9 100644
--- a/drivers/ide/ide-floppy_ioctl.c
+++ b/drivers/ide/ide-floppy_ioctl.c
@@ -287,8 +287,7 @@ int ide_floppy_ioctl(ide_drive_t *drive, struct block_device *bdev,
* and CDROM_SEND_PACKET (legacy) ioctls
*/
if (cmd != CDROM_SEND_PACKET && cmd != SCSI_IOCTL_SEND_COMMAND)
- err = scsi_cmd_ioctl(bdev->bd_disk->queue, bdev->bd_disk,
- mode, cmd, argp);
+ err = scsi_cmd_blk_ioctl(bdev, mode, cmd, argp);

if (err == -ENOTTY)
err = generic_ide_ioctl(drive, bdev, cmd, arg);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 18e6c59ed12d..654e2674e7c3 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -907,7 +907,7 @@ static int sd_ioctl(struct block_device *bdev, fmode_t mode,
case SCSI_IOCTL_GET_BUS_NUMBER:
return scsi_ioctl(sdp, cmd, p);
default:
- error = scsi_cmd_ioctl(disk->queue, disk, mode, cmd, p);
+ error = scsi_cmd_blk_ioctl(bdev, mode, cmd, p);
if (error != -ENOTTY)
return error;
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index cda62da68108..ba55e497f7dc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -793,6 +793,8 @@ extern void blk_plug_device(struct request_queue *);
extern void blk_plug_device_unlocked(struct request_queue *);
extern int blk_remove_plug(struct request_queue *);
extern void blk_recount_segments(struct request_queue *, struct bio *);
+extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
+ unsigned int, void __user *);
extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
unsigned int, void __user *);
extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
--
1.8.5.2

2014-02-05 20:04:45

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 039/213] tcp: allow splice() to build full TSO packets

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 2f53384424251c06038ae612e56231b96ab610ee upstream.

vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.

4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)

This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)

In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.

Call tcp_push() only if MSG_MORE is not set in the flags parameter.

This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.

In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)

Reported-by: Tom Herbert <[email protected]>
Cc: Nandita Dukkipati <[email protected]>
Cc: Neal Cardwell <[email protected]>
Cc: Tom Herbert <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: H.K. Jerry Chu <[email protected]>
Cc: Maciej Żenczykowski <[email protected]>
Cc: Mahesh Bandewar <[email protected]>
Cc: Ilpo Järvinen <[email protected]>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/ipv4/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3a8cbf72b06e..cea0a9223c5d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -849,7 +849,7 @@ wait_for_memory:
}

out:
- if (copied)
+ if (copied && !(flags & MSG_MORE))
tcp_push(sk, flags, mss_now, tp->nonagle);
return copied;

--
1.8.5.2

2014-02-05 20:04:51

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 041/213] tcp: fix MSG_SENDPAGE_NOTLAST logic

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit ae62ca7b03217be5e74759dc6d7698c95df498b3 upstream.

commit 35f9c09fe9c72e (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
frags but the last one for a splice() call.

The condition used to set the flag in pipe_to_sendpage() relied on
splice() user passing the exact number of bytes present in the pipe,
or a smaller one.

But some programs pass an arbitrary high value, and the test fails.

The effect of this bug is a lack of tcp_push() at the end of a
splice(pipe -> socket) call, and possibly very slow or erratic TCP
sessions.

We should both test sd->total_len and fact that another fragment
is in the pipe (pipe->nrbufs > 1)

Many thanks to Willy for providing very clear bug report, bisection
and test programs.

Reported-by: Willy Tarreau <[email protected]>
Bisected-by: Willy Tarreau <[email protected]>
Tested-by: Willy Tarreau <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/splice.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index 3bec7c63be64..1c991d6a64b4 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -640,8 +640,10 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe,
ret = buf->ops->confirm(pipe, buf);
if (!ret) {
more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
- if (sd->len < sd->total_len)
+
+ if (sd->len < sd->total_len && pipe->nrbufs > 1)
more |= MSG_SENDPAGE_NOTLAST;
+
if (file->f_op && file->f_op->sendpage)
ret = file->f_op->sendpage(file, buf->page, buf->offset,
sd->len, &pos, more);
--
1.8.5.2

2014-02-05 20:04:58

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 045/213] net: fix divide by zero in tcp algorithm illinois

From: Jesper Dangaard Brouer <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 8f363b77ee4fbf7c3bbcf5ec2c5ca482d396d664 upstream.

Reading TCP stats when using TCP Illinois congestion control algorithm
can cause a divide by zero kernel oops.

The division by zero occur in tcp_illinois_info() at:
do_div(t, ca->cnt_rtt);
where ca->cnt_rtt can become zero (when rtt_reset is called)

Steps to Reproduce:
1. Register tcp_illinois:
# sysctl -w net.ipv4.tcp_congestion_control=illinois
2. Monitor internal TCP information via command "ss -i"
# watch -d ss -i
3. Establish new TCP conn to machine

Either it fails at the initial conn, or else it needs to wait
for a loss or a reset.

This is only related to reading stats. The function avg_delay() also
performs the same divide, but is guarded with a (ca->cnt_rtt > 0) at its
calling point in update_params(). Thus, simply fix tcp_illinois_info().

Function tcp_illinois_info() / get_info() is called without
socket lock. Thus, eliminate any race condition on ca->cnt_rtt
by using a local stack variable. Simply reuse info.tcpv_rttcnt,
as its already set to ca->cnt_rtt.
Function avg_delay() is not affected by this race condition, as
its called with the socket lock.

Cc: Petr Matousek <[email protected]>
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Acked-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/ipv4/tcp_illinois.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c
index 1eba160b72dc..c35d91f0fb11 100644
--- a/net/ipv4/tcp_illinois.c
+++ b/net/ipv4/tcp_illinois.c
@@ -313,11 +313,13 @@ static void tcp_illinois_info(struct sock *sk, u32 ext,
.tcpv_rttcnt = ca->cnt_rtt,
.tcpv_minrtt = ca->base_rtt,
};
- u64 t = ca->sum_rtt;

- do_div(t, ca->cnt_rtt);
- info.tcpv_rtt = t;
+ if (info.tcpv_rttcnt > 0) {
+ u64 t = ca->sum_rtt;

+ do_div(t, info.tcpv_rttcnt);
+ info.tcpv_rtt = t;
+ }
nla_put(skb, INET_DIAG_VEGASINFO, sizeof(info), &info);
}
}
--
1.8.5.2

2014-02-05 20:05:27

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 074/213] tg3: Avoid null pointer dereference in tg3_interrupt in netconsole mode

From: Nithin Nayak Sujir <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 9c13cb8bb477a83b9a3c9e5a5478a4e21294a760 upstream.

When netconsole is enabled, logging messages generated during tg3_open
can result in a null pointer dereference for the uninitialized tg3
status block. Use the irq_sync flag to disable polling in the early
stages. irq_sync is cleared when the driver is enabling interrupts after
all initialization is completed.

Signed-off-by: Nithin Nayak Sujir <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[PG: drivers/net/ethernet/broadcom/tg3.c --> drivers/net/tg3.c in .34]
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/net/tg3.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index ecc41cffb470..e6045d1998af 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -5278,6 +5278,9 @@ static void tg3_poll_controller(struct net_device *dev)
int i;
struct tg3 *tp = netdev_priv(dev);

+ if (tg3_irq_sync(tp))
+ return;
+
for (i = 0; i < tp->irq_cnt; i++)
tg3_interrupt(tp->napi[i].irq_vec, &tp->napi[i]);
}
@@ -14476,6 +14479,7 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
tp->pm_cap = pm_cap;
tp->rx_mode = TG3_DEF_RX_MODE;
tp->tx_mode = TG3_DEF_TX_MODE;
+ tp->irq_sync = 1;

if (tg3_debug > 0)
tp->msg_enable = tg3_debug;
--
1.8.5.2

2014-02-05 20:05:38

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 077/213] NFSv4: Revalidate uid/gid after open

From: Jonathan Nieder <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

This is a shorter (and more appropriate for stable kernels) analog to
the following upstream commit:

commit 6926afd1925a54a13684ebe05987868890665e2b
Author: Trond Myklebust <[email protected]>
Date: Sat Jan 7 13:22:46 2012 -0500

NFSv4: Save the owner/group name string when doing open

...so that we can do the uid/gid mapping outside the asynchronous RPC
context.
This fixes a bug in the current NFSv4 atomic open code where the client
isn't able to determine what the true uid/gid fields of the file are,
(because the asynchronous nature of the OPEN call denies it the ability
to do an upcall) and so fills them with default values, marking the
inode as needing revalidation.
Unfortunately, in some cases, the VFS will do some additional sanity
checks on the file, and may override the server's decision to allow
the open because it sees the wrong owner/group fields.

Signed-off-by: Trond Myklebust <[email protected]>

Without this patch, logging into two different machines with home
directories mounted over NFS4 and then running "vim" and typing ":q"
in each reliably produces the following error on the second machine:

E137: Viminfo file is not writable: /users/system/rtheys/.viminfo

This regression was introduced by 80e52aced138 ("NFSv4: Don't do
idmapper upcalls for asynchronous RPC calls", merged during the 2.6.32
cycle) --- after the OPEN call, .viminfo has the default values for
st_uid and st_gid (0xfffffffe) cached because we do not want to let
rpciod wait for an idmapper upcall to fill them in.

The fix used in mainline is to save the owner and group as strings and
perform the upcall in _nfs4_proc_open outside the rpciod context,
which takes about 600 lines. For stable, we can do something similar
with a one-liner: make open check for the stale fields and make a
(synchronous) GETATTR call to fill them when needed.

Trond dictated the patch, I typed it in, and Rik tested it.

Addresses http://bugs.debian.org/659111 and
https://bugzilla.redhat.com/789298

Reported-by: Rik Theys <[email protected]>
Explained-by: David Flyn <[email protected]>
Signed-off-by: Jonathan Nieder <[email protected]>
Tested-by: Rik Theys <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[PG: commit 19165bdbb3622cfca0ff66e8b30248d469b849d6 in v3.0.32]
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/nfs/nfs4proc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 8dd330925ede..96e440aba77e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1669,6 +1669,7 @@ static int _nfs4_do_open(struct inode *dir, struct path *path, fmode_t fmode, in
goto err_opendata_put;
if (server->caps & NFS_CAP_POSIX_LOCK)
set_bit(NFS_STATE_POSIX_LOCKS, &state->flags);
+ nfs_revalidate_inode(server, state->inode);
nfs4_opendata_put(opendata);
nfs4_put_state_owner(sp);
*res = state;
--
1.8.5.2

2014-02-05 20:06:31

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 119/213] serial: 8250, increase PASS_LIMIT

From: Jiri Slaby <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit e7328ae1848966181a7ac47e8ae6cddbd2cf55f3 upstream.

With virtual machines like qemu, it's pretty common to see "too much
work for irq4" messages nowadays. This happens when a bunch of output
is printed on the emulated serial console. This is caused by too low
PASS_LIMIT. When ISR loops more than the limit, it spits the message.

I've been using a kernel with doubled the limit and I couldn't see no
problems. Maybe it's time to get rid of the message now?

Signed-off-by: Jiri Slaby <[email protected]>
Cc: Alan Cox <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[PG: drivers/tty/serial/8250.c ---> drivers/serial/8250.c in 2.6.34]
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/serial/8250.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/serial/8250.c b/drivers/serial/8250.c
index c1d79a233476..848894773c64 100644
--- a/drivers/serial/8250.c
+++ b/drivers/serial/8250.c
@@ -82,7 +82,7 @@ static unsigned int skip_txen_test; /* force skip of txen test at init time */
#define DEBUG_INTR(fmt...) do { } while (0)
#endif

-#define PASS_LIMIT 256
+#define PASS_LIMIT 512

#define BOTH_EMPTY (UART_LSR_TEMT | UART_LSR_THRE)

--
1.8.5.2

2014-02-05 20:06:53

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 128/213] Bluetooth: Fix incorrect strncpy() in hidp_setup_hid()

From: Anderson Lizardo <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 0a9ab9bdb3e891762553f667066190c1d22ad62b upstream.

The length parameter should be sizeof(req->name) - 1 because there is no
guarantee that string provided by userspace will contain the trailing
'\0'.

Can be easily reproduced by manually setting req->name to 128 non-zero
bytes prior to ioctl(HIDPCONNADD) and checking the device name setup on
input subsystem:

$ cat /sys/devices/pnp0/00\:04/tty/ttyS0/hci0/hci0\:1/input8/name
AAAAAA[...]AAAAAAAAf0:af:f0:af:f0:af

("f0:af:f0:af:f0:af" is the device bluetooth address, taken from "phys"
field in struct hid_device due to overflow.)

Signed-off-by: Anderson Lizardo <[email protected]>
Acked-by: Marcel Holtmann <[email protected]>
Signed-off-by: Gustavo Padovan <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/bluetooth/hidp/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index 280529ad9274..a01808691565 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -790,7 +790,7 @@ static int hidp_setup_hid(struct hidp_session *session,
hid->version = req->version;
hid->country = req->country;

- strncpy(hid->name, req->name, 128);
+ strncpy(hid->name, req->name, sizeof(req->name) - 1);
strncpy(hid->phys, batostr(&src), 64);
strncpy(hid->uniq, batostr(&dst), 64);

--
1.8.5.2

2014-02-05 20:07:05

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 133/213] Bluetooth: fix possible info leak in bt_sock_recvmsg()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 4683f42fde3977bdb4e8a09622788cc8b5313778 upstream.

In case the socket is already shutting down, bt_sock_recvmsg() returns
with 0 without updating msg_namelen leading to net/socket.c leaking the
local, uninitialized sockaddr_storage variable to userland -- 128 bytes
of kernel stack memory.

Fix this by moving the msg_namelen assignment in front of the shutdown
test.

Cc: Marcel Holtmann <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Johan Hedberg <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/bluetooth/af_bluetooth.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 404a8500fd03..0891857b7ca2 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -240,14 +240,14 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
if (flags & (MSG_OOB))
return -EOPNOTSUPP;

+ msg->msg_namelen = 0;
+
if (!(skb = skb_recv_datagram(sk, flags, noblock, &err))) {
if (sk->sk_shutdown & RCV_SHUTDOWN)
return 0;
return err;
}

- msg->msg_namelen = 0;
-
copied = skb->len;
if (len < copied) {
msg->msg_flags |= MSG_TRUNC;
--
1.8.5.2

2014-02-05 20:07:18

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 150/213] USB: CDC ACM: Fix NULL pointer dereference

From: Sven Schnelle <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 99f347caa4568cb803862730b3b1f1942639523f upstream.

If a device specifies zero endpoints in its interface descriptor,
the kernel oopses in acm_probe(). Even though that's clearly an
invalid descriptor, we should test wether we have all endpoints.
This is especially bad as this oops can be triggered by just
plugging a USB device in.

Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/class/cdc-acm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index af45f735f6e5..45f85df306e5 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -1120,7 +1120,8 @@ skip_normal_probe:
}


- if (data_interface->cur_altsetting->desc.bNumEndpoints < 2)
+ if (data_interface->cur_altsetting->desc.bNumEndpoints < 2 ||
+ control_interface->cur_altsetting->desc.bNumEndpoints == 0)
return -EINVAL;

epctrl = &control_interface->cur_altsetting->endpoint[0].desc;
--
1.8.5.2

2014-02-05 20:07:29

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 156/213] usbdevfs: Correct amount of data copied to user in processcompl_compat

From: Hans de Goede <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 2102e06a5f2e414694921f23591f072a5ba7db9f upstream.

iso data buffers may have holes in them if some packets were short, so for
iso urbs we should always copy the entire buffer, just like the regular
processcompl does.

Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Alan Stern <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/core/devio.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 85a496754780..3437cf2cdcaf 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -1527,10 +1527,14 @@ static int processcompl_compat(struct async *as, void __user * __user *arg)
void __user *addr = as->userurb;
unsigned int i;

- if (as->userbuffer && urb->actual_length)
- if (copy_to_user(as->userbuffer, urb->transfer_buffer,
- urb->actual_length))
+ if (as->userbuffer && urb->actual_length) {
+ if (urb->number_of_packets > 0) /* Isochronous */
+ i = urb->transfer_buffer_length;
+ else /* Non-Isoc */
+ i = urb->actual_length;
+ if (copy_to_user(as->userbuffer, urb->transfer_buffer, i))
return -EFAULT;
+ }
if (put_user(as->status, &userurb->status))
return -EFAULT;
if (put_user(urb->actual_length, &userurb->actual_length))
--
1.8.5.2

2014-02-05 20:07:25

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 153/213] USB: kaweth.c: use GFP_ATOMIC under spin_lock

From: Dan Carpenter <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit e4c7f259c5be99dcfc3d98f913590663b0305bf8 upstream.

The problem is that we call this with a spin lock held. The call tree
is:
kaweth_start_xmit() holds kaweth->device_lock.
-> kaweth_async_set_rx_mode()
-> kaweth_control()
-> kaweth_internal_control_msg()

The kaweth_internal_control_msg() function is only called from
kaweth_control() which used GFP_ATOMIC for its allocations.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/net/usb/kaweth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c
index c4c334d9770f..72906eb06b0a 100644
--- a/drivers/net/usb/kaweth.c
+++ b/drivers/net/usb/kaweth.c
@@ -1317,7 +1317,7 @@ static int kaweth_internal_control_msg(struct usb_device *usb_dev,
int retv;
int length = 0; /* shut up GCC */

- urb = usb_alloc_urb(0, GFP_NOIO);
+ urb = usb_alloc_urb(0, GFP_ATOMIC);
if (!urb)
return -ENOMEM;

--
1.8.5.2

2014-02-05 20:08:06

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 171/213] ext4: don't let i_reserved_meta_blocks go negative

From: Brian Foster <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream.

If we hit a condition where we have allocated metadata blocks that
were not appropriately reserved, we risk underflow of
ei->i_reserved_meta_blocks. In turn, this can throw
sbi->s_dirtyclusters_counter significantly out of whack and undermine
the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this
occurs and set i_allocated_meta_blocks to avoid this problem.

This condition is reproduced by xfstests 270 against ext2 with
delalloc enabled:

Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost

270 ultimately fails with an inconsistent filesystem and requires an
fsck to repair. The cause of the error is an underflow in
ext4_da_update_reserve_space() due to an unreserved meta block
allocation.

Signed-off-by: Brian Foster <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/inode.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b8965bb679ee..893da43223d4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1089,6 +1089,15 @@ void ext4_da_update_reserve_space(struct inode *inode,
used = ei->i_reserved_data_blocks;
}

+ if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) {
+ ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, allocated %d "
+ "with only %d reserved metadata blocks\n", __func__,
+ inode->i_ino, ei->i_allocated_meta_blocks,
+ ei->i_reserved_meta_blocks);
+ WARN_ON(1);
+ ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks;
+ }
+
/* Update per-inode reservations */
ei->i_reserved_data_blocks -= used;
used += ei->i_allocated_meta_blocks;
--
1.8.5.2

2014-02-05 20:08:22

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 191/213] Fix install_process_keyring error handling

From: Andi Kleen <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 27d6379894be4a81984da4d48002196a83939ca9 upstream.

Fix an incorrect error check that returns 1 for error instead of the
expected error code.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
security/keys/process_keys.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index 71c10cec3c18..058d4fdf5de1 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -207,7 +207,7 @@ static int install_process_keyring(void)
ret = install_process_keyring_to_cred(new);
if (ret < 0) {
abort_creds(new);
- return ret != -EEXIST ?: 0;
+ return ret != -EEXIST ? ret : 0;
}

return commit_creds(new);
--
1.8.5.2

2014-02-05 20:08:34

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 199/213] pcdp: use early_ioremap/early_iounmap to access pcdp table

From: Greg Pearson <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 6c4088ac3a4d82779903433bcd5f048c58fb1aca upstream.

efi_setup_pcdp_console() is called during boot to parse the HCDP/PCDP
EFI system table and setup an early console for printk output. The
routine uses ioremap/iounmap to setup access to the HCDP/PCDP table
information.

The call to ioremap is happening early in the boot process which leads
to a panic on x86_64 systems:

panic+0x01ca
do_exit+0x043c
oops_end+0x00a7
no_context+0x0119
__bad_area_nosemaphore+0x0138
bad_area_nosemaphore+0x000e
do_page_fault+0x0321
page_fault+0x0020
reserve_memtype+0x02a1
__ioremap_caller+0x0123
ioremap_nocache+0x0012
efi_setup_pcdp_console+0x002b
setup_arch+0x03a9
start_kernel+0x00d4
x86_64_start_reservations+0x012c
x86_64_start_kernel+0x00fe

This replaces the calls to ioremap/iounmap in efi_setup_pcdp_console()
with calls to early_ioremap/early_iounmap which can be called during
early boot.

This patch was tested on an x86_64 prototype system which uses the
HCDP/PCDP table for early console setup.

Signed-off-by: Greg Pearson <[email protected]>
Acked-by: Khalid Aziz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/firmware/pcdp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/pcdp.c b/drivers/firmware/pcdp.c
index 51e0e2d8fac6..a330492e06f9 100644
--- a/drivers/firmware/pcdp.c
+++ b/drivers/firmware/pcdp.c
@@ -95,7 +95,7 @@ efi_setup_pcdp_console(char *cmdline)
if (efi.hcdp == EFI_INVALID_TABLE_ADDR)
return -ENODEV;

- pcdp = ioremap(efi.hcdp, 4096);
+ pcdp = early_ioremap(efi.hcdp, 4096);
printk(KERN_INFO "PCDP: v%d at 0x%lx\n", pcdp->rev, efi.hcdp);

if (strstr(cmdline, "console=hcdp")) {
@@ -131,6 +131,6 @@ efi_setup_pcdp_console(char *cmdline)
}

out:
- iounmap(pcdp);
+ early_iounmap(pcdp, 4096);
return rc;
}
--
1.8.5.2

2014-02-05 20:08:29

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 198/213] fuse: verify all ioctl retry iov elements

From: Zach Brown <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit fb6ccff667712c46b4501b920ea73a326e49626a upstream.

Commit 7572777eef78ebdee1ecb7c258c0ef94d35bad16 attempted to verify that
the total iovec from the client doesn't overflow iov_length() but it
only checked the first element. The iovec could still overflow by
starting with a small element. The obvious fix is to check all the
elements.

The overflow case doesn't look dangerous to the kernel as the copy is
limited by the length after the overflow. This fix restores the
intention of returning an error instead of successfully copying less
than the iovec represented.

I found this by code inspection. I built it but don't have a test case.
I'm cc:ing stable because the initial commit did as well.

Signed-off-by: Zach Brown <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/fuse/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f6104a958812..102d58297174 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1664,7 +1664,7 @@ static int fuse_verify_ioctl_iov(struct iovec *iov, size_t count)
size_t n;
u32 max = FUSE_MAX_PAGES_PER_REQ << PAGE_SHIFT;

- for (n = 0; n < count; n++) {
+ for (n = 0; n < count; n++, iov++) {
if (iov->iov_len > (size_t) max)
return -ENOMEM;
max -= iov->iov_len;
--
1.8.5.2

2014-02-05 20:08:49

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 206/213] SCSI: libsas: fix sas_discover_devices return code handling

From: Dan Williams <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit e69e5d3d25d6b58543f782a515baeda064e2b601 upstream.

commit b17caa174a7e1fd2e17b26e210d4ee91c4c28b37 upstream.

commit 198439e4 [SCSI] libsas: do not set res = 0 in sas_ex_discover_dev()
commit 19252de6 [SCSI] libsas: fix wide port hotplug issues

The above commits seem to have confused the return value of
sas_ex_discover_dev which is non-zero on failure and
sas_ex_join_wide_port which just indicates short circuiting discovery on
already established ports. The result is random discovery failures
depending on configuration.

Calls to sas_ex_join_wide_port are the source of the trouble as its
return value is errantly assigned to 'res'. Convert it to bool and stop
returning its result up the stack.

Tested-by: Dan Melnic <[email protected]>
Reported-by: Dan Melnic <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Jack Wang <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/scsi/libsas/sas_expander.c | 39 ++++++++++++--------------------------
1 file changed, 12 insertions(+), 27 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index cb4964b54191..f31f85e03f3a 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -755,7 +755,7 @@ static struct domain_device *sas_ex_discover_end_dev(
}

/* See if this phy is part of a wide port */
-static int sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
+static bool sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
{
struct ex_phy *phy = &parent->ex_dev.ex_phy[phy_id];
int i;
@@ -771,11 +771,11 @@ static int sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
sas_port_add_phy(ephy->port, phy->phy);
phy->port = ephy->port;
phy->phy_state = PHY_DEVICE_DISCOVERED;
- return 0;
+ return true;
}
}

- return -ENODEV;
+ return false;
}

static struct domain_device *sas_ex_discover_expander(
@@ -913,8 +913,7 @@ static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
return res;
}

- res = sas_ex_join_wide_port(dev, phy_id);
- if (!res) {
+ if (sas_ex_join_wide_port(dev, phy_id)) {
SAS_DPRINTK("Attaching ex phy%d to wide port %016llx\n",
phy_id, SAS_ADDR(ex_phy->attached_sas_addr));
return res;
@@ -959,8 +958,7 @@ static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
if (SAS_ADDR(ex->ex_phy[i].attached_sas_addr) ==
SAS_ADDR(child->sas_addr)) {
ex->ex_phy[i].phy_state= PHY_DEVICE_DISCOVERED;
- res = sas_ex_join_wide_port(dev, i);
- if (!res)
+ if (sas_ex_join_wide_port(dev, i))
SAS_DPRINTK("Attaching ex phy%d to wide port %016llx\n",
i, SAS_ADDR(ex->ex_phy[i].attached_sas_addr));

@@ -1813,32 +1811,20 @@ static int sas_discover_new(struct domain_device *dev, int phy_id)
{
struct ex_phy *ex_phy = &dev->ex_dev.ex_phy[phy_id];
struct domain_device *child;
- bool found = false;
- int res, i;
+ int res;

SAS_DPRINTK("ex %016llx phy%d new device attached\n",
SAS_ADDR(dev->sas_addr), phy_id);
res = sas_ex_phy_discover(dev, phy_id);
if (res)
- goto out;
- /* to support the wide port inserted */
- for (i = 0; i < dev->ex_dev.num_phys; i++) {
- struct ex_phy *ex_phy_temp = &dev->ex_dev.ex_phy[i];
- if (i == phy_id)
- continue;
- if (SAS_ADDR(ex_phy_temp->attached_sas_addr) ==
- SAS_ADDR(ex_phy->attached_sas_addr)) {
- found = true;
- break;
- }
- }
- if (found) {
- sas_ex_join_wide_port(dev, phy_id);
+ return res;
+
+ if (sas_ex_join_wide_port(dev, phy_id))
return 0;
- }
+
res = sas_ex_discover_devices(dev, phy_id);
- if (!res)
- goto out;
+ if (res)
+ return res;
list_for_each_entry(child, &dev->ex_dev.children, siblings) {
if (SAS_ADDR(child->sas_addr) ==
SAS_ADDR(ex_phy->attached_sas_addr)) {
@@ -1848,7 +1834,6 @@ static int sas_discover_new(struct domain_device *dev, int phy_id)
break;
}
}
-out:
return res;
}

--
1.8.5.2

2014-02-05 20:08:39

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 212/213] crypto: sha512 - Fix byte counter overflow in SHA-512

From: Kent Yoder <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 25c3d30c918207556ae1d6e663150ebdf902186b upstream.

The current code only increments the upper 64 bits of the SHA-512 byte
counter when the number of bytes hashed happens to hit 2^64 exactly.

This patch increments the upper 64 bits whenever the lower 64 bits
overflows.

Signed-off-by: Kent Yoder <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
crypto/sha512_generic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index 9ed9f60316e5..899b2fa24e50 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -177,7 +177,7 @@ sha512_update(struct shash_desc *desc, const u8 *data, unsigned int len)
index = sctx->count[0] & 0x7f;

/* Update number of bytes */
- if (!(sctx->count[0] += len))
+ if ((sctx->count[0] += len) < len)
sctx->count[1]++;

part_len = 128 - index;
--
1.8.5.2

2014-02-05 20:08:59

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 213/213] video:uvesafb: Fix oops that uvesafb try to execute NX-protected page

From: Wang YanQing <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit b78f29ca0516266431688c5eb42d39ce42ec039a upstream.

This patch fix the oops below that catched in my machine

[ 81.560602] uvesafb: NVIDIA Corporation, GT216 Board - 0696a290, Chip Rev , OEM: NVIDIA, VBE v3.0
[ 81.609384] uvesafb: protected mode interface info at c000:d350
[ 81.609388] uvesafb: pmi: set display start = c00cd3b3, set palette = c00cd40e
[ 81.609390] uvesafb: pmi: ports = 3b4 3b5 3ba 3c0 3c1 3c4 3c5 3c6 3c7 3c8 3c9 3cc 3ce 3cf 3d0 3d1 3d2 3d3 3d4 3d5 3da
[ 81.614558] uvesafb: VBIOS/hardware doesn't support DDC transfers
[ 81.614562] uvesafb: no monitor limits have been set, default refresh rate will be used
[ 81.614994] uvesafb: scrolling: ypan using protected mode interface, yres_virtual=4915
[ 81.744147] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[ 81.744153] BUG: unable to handle kernel paging request at c00cd3b3
[ 81.744159] IP: [<c00cd3b3>] 0xc00cd3b2
[ 81.744167] *pdpt = 00000000016d6001 *pde = 0000000001c7b067 *pte = 80000000000cd163
[ 81.744171] Oops: 0011 [#1] SMP
[ 81.744174] Modules linked in: uvesafb(+) cfbcopyarea cfbimgblt cfbfillrect
[ 81.744178]
[ 81.744181] Pid: 3497, comm: modprobe Not tainted 3.3.0-rc4NX+ #71 Acer Aspire 4741 /Aspire 4741
[ 81.744185] EIP: 0060:[<c00cd3b3>] EFLAGS: 00010246 CPU: 0
[ 81.744187] EIP is at 0xc00cd3b3
[ 81.744189] EAX: 00004f07 EBX: 00000000 ECX: 00000000 EDX: 00000000
[ 81.744191] ESI: f763f000 EDI: f763f6e8 EBP: f57f3a0c ESP: f57f3a00
[ 81.744192] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 81.744195] Process modprobe (pid: 3497, ti=f57f2000 task=f748c600 task.ti=f57f2000)
[ 81.744196] Stack:
[ 81.744197] f82512c5 f759341c 00000000 f57f3a30 c124a9bc 00000001 00000001 000001e0
[ 81.744202] f8251280 f763f000 f7593400 00000000 f57f3a40 c12598dd f5c0c000 00000000
[ 81.744206] f57f3b10 c1255efe c125a21a 00000006 f763f09c 00000000 c1c6cb60 f7593400
[ 81.744210] Call Trace:
[ 81.744215] [<f82512c5>] ? uvesafb_pan_display+0x45/0x60 [uvesafb]
[ 81.744222] [<c124a9bc>] fb_pan_display+0x10c/0x160
[ 81.744226] [<f8251280>] ? uvesafb_vbe_find_mode+0x180/0x180 [uvesafb]
[ 81.744230] [<c12598dd>] bit_update_start+0x1d/0x50
[ 81.744232] [<c1255efe>] fbcon_switch+0x39e/0x550
[ 81.744235] [<c125a21a>] ? bit_cursor+0x4ea/0x560
[ 81.744240] [<c129b6cb>] redraw_screen+0x12b/0x220
[ 81.744245] [<c128843b>] ? tty_do_resize+0x3b/0xc0
[ 81.744247] [<c129ef42>] vc_do_resize+0x3d2/0x3e0
[ 81.744250] [<c129efb4>] vc_resize+0x14/0x20
[ 81.744253] [<c12586bd>] fbcon_init+0x29d/0x500
[ 81.744255] [<c12984c4>] ? set_inverse_trans_unicode+0xe4/0x110
[ 81.744258] [<c129b378>] visual_init+0xb8/0x150
[ 81.744261] [<c129c16c>] bind_con_driver+0x16c/0x360
[ 81.744264] [<c129b47e>] ? register_con_driver+0x6e/0x190
[ 81.744267] [<c129c3a1>] take_over_console+0x41/0x50
[ 81.744269] [<c1257b7a>] fbcon_takeover+0x6a/0xd0
[ 81.744272] [<c12594b8>] fbcon_event_notify+0x758/0x790
[ 81.744277] [<c10929e2>] notifier_call_chain+0x42/0xb0
[ 81.744280] [<c1092d30>] __blocking_notifier_call_chain+0x60/0x90
[ 81.744283] [<c1092d7a>] blocking_notifier_call_chain+0x1a/0x20
[ 81.744285] [<c124a5a1>] fb_notifier_call_chain+0x11/0x20
[ 81.744288] [<c124b759>] register_framebuffer+0x1d9/0x2b0
[ 81.744293] [<c1061c73>] ? ioremap_wc+0x33/0x40
[ 81.744298] [<f82537c6>] uvesafb_probe+0xaba/0xc40 [uvesafb]
[ 81.744302] [<c12bb81f>] platform_drv_probe+0xf/0x20
[ 81.744306] [<c12ba558>] driver_probe_device+0x68/0x170
[ 81.744309] [<c12ba731>] __device_attach+0x41/0x50
[ 81.744313] [<c12b9088>] bus_for_each_drv+0x48/0x70
[ 81.744316] [<c12ba7f3>] device_attach+0x83/0xa0
[ 81.744319] [<c12ba6f0>] ? __driver_attach+0x90/0x90
[ 81.744321] [<c12b991f>] bus_probe_device+0x6f/0x90
[ 81.744324] [<c12b8a45>] device_add+0x5e5/0x680
[ 81.744329] [<c122a1a3>] ? kvasprintf+0x43/0x60
[ 81.744332] [<c121e6e4>] ? kobject_set_name_vargs+0x64/0x70
[ 81.744335] [<c121e6e4>] ? kobject_set_name_vargs+0x64/0x70
[ 81.744339] [<c12bbe9f>] platform_device_add+0xff/0x1b0
[ 81.744343] [<f8252906>] uvesafb_init+0x50/0x9b [uvesafb]
[ 81.744346] [<c100111f>] do_one_initcall+0x2f/0x170
[ 81.744350] [<f82528b6>] ? uvesafb_is_valid_mode+0x66/0x66 [uvesafb]
[ 81.744355] [<c10c6994>] sys_init_module+0xf4/0x1410
[ 81.744359] [<c1157fc0>] ? vfsmount_lock_local_unlock_cpu+0x30/0x30
[ 81.744363] [<c144cb10>] sysenter_do_call+0x12/0x36
[ 81.744365] Code: f5 00 00 00 32 f6 66 8b da 66 d1 e3 66 ba d4 03 8a e3 b0 1c 66 ef b0 1e 66 ef 8a e7 b0 1d 66 ef b0 1f 66 ef e8 fa 00 00 00 61 c3 <60> e8 c8 00 00 00 66 8b f3 66 8b da 66 ba d4 03 b0 0c 8a e5 66
[ 81.744388] EIP: [<c00cd3b3>] 0xc00cd3b3 SS:ESP 0068:f57f3a00
[ 81.744391] CR2: 00000000c00cd3b3
[ 81.744393] ---[ end trace 18b2c87c925b54d6 ]---

Signed-off-by: Wang YanQing <[email protected]>
Cc: Michal Januszewski <[email protected]>
Cc: Alan Cox <[email protected]>
Signed-off-by: Florian Tobias Schandinat <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/video/uvesafb.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
index 7b8839ebf3c4..917c465463db 100644
--- a/drivers/video/uvesafb.c
+++ b/drivers/video/uvesafb.c
@@ -815,8 +815,15 @@ static int __devinit uvesafb_vbe_init(struct fb_info *info)
par->pmi_setpal = pmi_setpal;
par->ypan = ypan;

- if (par->pmi_setpal || par->ypan)
- uvesafb_vbe_getpmi(task, par);
+ if (par->pmi_setpal || par->ypan) {
+ if (__supported_pte_mask & _PAGE_NX) {
+ par->pmi_setpal = par->ypan = 0;
+ printk(KERN_WARNING "uvesafb: NX protection is actively."
+ "We have better not to use the PMI.\n");
+ } else {
+ uvesafb_vbe_getpmi(task, par);
+ }
+ }
#else
/* The protected mode interface is not available on non-x86. */
par->pmi_setpal = par->ypan = 0;
--
1.8.5.2

2014-02-05 20:09:11

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 203/213] cipso: don't follow a NULL pointer when setsockopt() is called

From: Paul Moore <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit a9d0acf8d157c30374af76d43e7f05b5b108be0c upstream.

[ Upstream commit 89d7ae34cdda4195809a5a987f697a517a2a3177 ]

As reported by Alan Cox, and verified by Lin Ming, when a user
attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL
tag the kernel dies a terrible death when it attempts to follow a NULL
pointer (the skb argument to cipso_v4_validate() is NULL when called via
the setsockopt() syscall).

This patch fixes this by first checking to ensure that the skb is
non-NULL before using it to find the incoming network interface. In
the unlikely case where the skb is NULL and the user attempts to add
a CIPSO option with the _TAG_LOCAL tag we return an error as this is
not something we want to allow.

A simple reproducer, kindly supplied by Lin Ming, although you must
have the CIPSO DOI #3 configure on the system first or you will be
caught early in cipso_v4_validate():

#include <sys/types.h>
#include <sys/socket.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <string.h>

struct local_tag {
char type;
char length;
char info[4];
};

struct cipso {
char type;
char length;
char doi[4];
struct local_tag local;
};

int main(int argc, char **argv)
{
int sockfd;
struct cipso cipso = {
.type = IPOPT_CIPSO,
.length = sizeof(struct cipso),
.local = {
.type = 128,
.length = sizeof(struct local_tag),
},
};

memset(cipso.doi, 0, 4);
cipso.doi[3] = 3;

sockfd = socket(AF_INET, SOCK_DGRAM, 0);
#define SOL_IP 0
setsockopt(sockfd, SOL_IP, IP_OPTIONS,
&cipso, sizeof(struct cipso));

return 0;
}

CC: Lin Ming <[email protected]>
Reported-by: Alan Cox <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/ipv4/cipso_ipv4.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index d5ef60963183..f8f338874719 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1727,8 +1727,10 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
case CIPSO_V4_TAG_LOCAL:
/* This is a non-standard tag that we only allow for
* local connections, so if the incoming interface is
- * not the loopback device drop the packet. */
- if (!(skb->dev->flags & IFF_LOOPBACK)) {
+ * not the loopback device drop the packet. Further,
+ * there is no legitimate reason for setting this from
+ * userspace so reject it if skb is NULL. */
+ if (skb == NULL || !(skb->dev->flags & IFF_LOOPBACK)) {
err_offset = opt_iter;
goto validate_return_locked;
}
--
1.8.5.2

2014-02-05 20:09:53

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 211/213] PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs

From: Thomas Jarosch <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit cdb1f35dc7de42802527140a3613871c394548e1 upstream.

commit f67fd55fa96f7d7295b43ffbc4a97d8f55e473aa upstream.

Some BIOS implementations leave the Intel GPU interrupts enabled,
even though no one is handling them (f.e. i915 driver is never loaded).
Additionally the interrupt destination is not set up properly
and the interrupt ends up -somewhere-.

These spurious interrupts are "sticky" and the kernel disables
the (shared) interrupt line after 100.000+ generated interrupts.

Fix it by disabling the still enabled interrupts.
This resolves crashes often seen on monitor unplug.

Tested on the following boards:
- Intel DH61CR: Affected
- Intel DH67BL: Affected
- Intel S1200KP server board: Affected
- Asus P8H61-M LE: Affected, but system does not crash.
Probably the IRQ ends up somewhere unnoticed.

According to reports on the net, the Intel DH61WW board is also affected.

Many thanks to Jesse Barnes from Intel for helping
with the register configuration and to Intel in general
for providing public hardware documentation.

Signed-off-by: Thomas Jarosch <[email protected]>
Tested-by: Charlie Suffin <[email protected]>
Signed-off-by: Jesse Barnes <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/pci/quirks.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6938fdc41e79..052af89854c1 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2692,6 +2692,40 @@ static void __devinit fixup_ti816x_class(struct pci_dev* dev)
}
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_TI, 0xb800, fixup_ti816x_class);

+/*
+ * Some BIOS implementations leave the Intel GPU interrupts enabled,
+ * even though no one is handling them (f.e. i915 driver is never loaded).
+ * Additionally the interrupt destination is not set up properly
+ * and the interrupt ends up -somewhere-.
+ *
+ * These spurious interrupts are "sticky" and the kernel disables
+ * the (shared) interrupt line after 100.000+ generated interrupts.
+ *
+ * Fix it by disabling the still enabled interrupts.
+ * This resolves crashes often seen on monitor unplug.
+ */
+#define I915_DEIER_REG 0x4400c
+static void __devinit disable_igfx_irq(struct pci_dev *dev)
+{
+ void __iomem *regs = pci_iomap(dev, 0, 0);
+ if (regs == NULL) {
+ dev_warn(&dev->dev, "igfx quirk: Can't iomap PCI device\n");
+ return;
+ }
+
+ /* Check if any interrupt line is still enabled */
+ if (readl(regs + I915_DEIER_REG) != 0) {
+ dev_warn(&dev->dev, "BIOS left Intel GPU interrupts enabled; "
+ "disabling\n");
+
+ writel(0, regs + I915_DEIER_REG);
+ }
+
+ pci_iounmap(dev, regs);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0102, disable_igfx_irq);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x010a, disable_igfx_irq);
+
static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
struct pci_fixup *end)
{
--
1.8.5.2

2014-02-05 20:10:47

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 209/213] fuse: fix stat call on 32 bit platforms

From: Pavel Shilovsky <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 45c72cd73c788dd18c8113d4a404d6b4a01decf1 upstream.

Now we store attr->ino at inode->i_ino, return attr->ino at the
first time and then return inode->i_ino if the attribute timeout
isn't expired. That's wrong on 32 bit platforms because attr->ino
is 64 bit and inode->i_ino is 32 bit in this case.

Fix this by saving 64 bit ino in fuse_inode structure and returning
it every time we call getattr. Also squash attr->ino into inode->i_ino
explicitly.

Signed-off-by: Pavel Shilovsky <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/fuse/dir.c | 1 +
fs/fuse/fuse_i.h | 3 +++
fs/fuse/inode.c | 17 ++++++++++++++++-
3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 4787ae6c5c1c..b359543c68e5 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -855,6 +855,7 @@ int fuse_update_attributes(struct inode *inode, struct kstat *stat,
if (stat) {
generic_fillattr(inode, stat);
stat->mode = fi->orig_i_mode;
+ stat->ino = fi->orig_ino;
}
}

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e6d614d10467..829aceeb77ad 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -76,6 +76,9 @@ struct fuse_inode {
preserve the original mode */
mode_t orig_i_mode;

+ /** 64 bit inode number */
+ u64 orig_ino;
+
/** Version of last attribute change */
u64 attr_version;

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ec14d19ce501..675aa27d393d 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -86,6 +86,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
fi->nlookup = 0;
fi->attr_version = 0;
fi->writectr = 0;
+ fi->orig_ino = 0;
INIT_LIST_HEAD(&fi->write_files);
INIT_LIST_HEAD(&fi->queued_writes);
INIT_LIST_HEAD(&fi->writepages);
@@ -140,6 +141,18 @@ static int fuse_remount_fs(struct super_block *sb, int *flags, char *data)
return 0;
}

+/*
+ * ino_t is 32-bits on 32-bit arch. We have to squash the 64-bit value down
+ * so that it will fit.
+ */
+static ino_t fuse_squash_ino(u64 ino64)
+{
+ ino_t ino = (ino_t) ino64;
+ if (sizeof(ino_t) < sizeof(u64))
+ ino ^= ino64 >> (sizeof(u64) - sizeof(ino_t)) * 8;
+ return ino;
+}
+
void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
u64 attr_valid)
{
@@ -149,7 +162,7 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
fi->attr_version = ++fc->attr_version;
fi->i_time = attr_valid;

- inode->i_ino = attr->ino;
+ inode->i_ino = fuse_squash_ino(attr->ino);
inode->i_mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
inode->i_nlink = attr->nlink;
inode->i_uid = attr->uid;
@@ -175,6 +188,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
fi->orig_i_mode = inode->i_mode;
if (!(fc->flags & FUSE_DEFAULT_PERMISSIONS))
inode->i_mode &= ~S_ISVTX;
+
+ fi->orig_ino = attr->ino;
}

void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
--
1.8.5.2

2014-02-05 20:08:25

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 193/213] dccp: check ccid before dereferencing

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 276bdb82dedb290511467a5a4fdbe9f0b52dce6f upstream.

ccid_hc_rx_getsockopt() and ccid_hc_tx_getsockopt() might be called with
a NULL ccid pointer leading to a NULL pointer dereference. This could
lead to a privilege escalation if the attacker is able to map page 0 and
prepare it with a fake ccid_ops pointer.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Gerrit Renker <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/dccp/ccid.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ccid.h b/net/dccp/ccid.h
index 6df6f8ac9636..4f78abbf1045 100644
--- a/net/dccp/ccid.h
+++ b/net/dccp/ccid.h
@@ -218,7 +218,7 @@ static inline int ccid_hc_rx_getsockopt(struct ccid *ccid, struct sock *sk,
u32 __user *optval, int __user *optlen)
{
int rc = -ENOPROTOOPT;
- if (ccid->ccid_ops->ccid_hc_rx_getsockopt != NULL)
+ if (ccid != NULL && ccid->ccid_ops->ccid_hc_rx_getsockopt != NULL)
rc = ccid->ccid_ops->ccid_hc_rx_getsockopt(sk, optname, len,
optval, optlen);
return rc;
@@ -229,7 +229,7 @@ static inline int ccid_hc_tx_getsockopt(struct ccid *ccid, struct sock *sk,
u32 __user *optval, int __user *optlen)
{
int rc = -ENOPROTOOPT;
- if (ccid->ccid_ops->ccid_hc_tx_getsockopt != NULL)
+ if (ccid != NULL && ccid->ccid_ops->ccid_hc_tx_getsockopt != NULL)
rc = ccid->ccid_ops->ccid_hc_tx_getsockopt(sk, optname, len,
optval, optlen);
return rc;
--
1.8.5.2

2014-02-05 20:11:26

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 202/213] futex: Test for pi_mutex on fault in futex_wait_requeue_pi()

From: Darren Hart <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit b6070a8d9853eda010a549fa9a09eb8d7269b929 upstream.

If fixup_pi_state_owner() faults, pi_mutex may be NULL. Test
for pi_mutex != NULL before testing the owner against current
and possibly unlocking it.

Signed-off-by: Darren Hart <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Dan Carpenter <[email protected]>
Link: http://lkml.kernel.org/r/dc59890338fc413606f04e5c5b131530734dae3d.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/futex.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index d4e7f0ea1f94..0e8043833223 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2348,7 +2348,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
* fault, unlock the rt_mutex and return the fault to userspace.
*/
if (ret == -EFAULT) {
- if (rt_mutex_owner(pi_mutex) == current)
+ if (pi_mutex && rt_mutex_owner(pi_mutex) == current)
rt_mutex_unlock(pi_mutex);
} else if (ret == -EINTR) {
/*
--
1.8.5.2

2014-02-05 20:12:05

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 210/213] phonet: Check input from user before allocating

From: Sasha Levin <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit bcf1b70ac6eb0ed8286c66e6bf37cb747cbaa04c upstream.

A phonet packet is limited to USHRT_MAX bytes, this is never checked during
tx which means that the user can specify any size he wishes, and the kernel
will attempt to allocate that size.

In the good case, it'll lead to the following warning, but it may also cause
the kernel to kick in the OOM and kill a random task on the server.

[ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730()
[ 8921.749770] Pid: 5081, comm: trinity Tainted: G W 3.4.0-rc1-next-20120402-sasha #46
[ 8921.756672] Call Trace:
[ 8921.758185] [<ffffffff810b2ba7>] warn_slowpath_common+0x87/0xb0
[ 8921.762868] [<ffffffff810b2be5>] warn_slowpath_null+0x15/0x20
[ 8921.765399] [<ffffffff8117eae5>] __alloc_pages_slowpath+0x65/0x730
[ 8921.769226] [<ffffffff81179c8a>] ? zone_watermark_ok+0x1a/0x20
[ 8921.771686] [<ffffffff8117d045>] ? get_page_from_freelist+0x625/0x660
[ 8921.773919] [<ffffffff8117f3a8>] __alloc_pages_nodemask+0x1f8/0x240
[ 8921.776248] [<ffffffff811c03e0>] kmalloc_large_node+0x70/0xc0
[ 8921.778294] [<ffffffff811c4bd4>] __kmalloc_node_track_caller+0x34/0x1c0
[ 8921.780847] [<ffffffff821b0e3c>] ? sock_alloc_send_pskb+0xbc/0x260
[ 8921.783179] [<ffffffff821b3c65>] __alloc_skb+0x75/0x170
[ 8921.784971] [<ffffffff821b0e3c>] sock_alloc_send_pskb+0xbc/0x260
[ 8921.787111] [<ffffffff821b002e>] ? release_sock+0x7e/0x90
[ 8921.788973] [<ffffffff821b0ff0>] sock_alloc_send_skb+0x10/0x20
[ 8921.791052] [<ffffffff824cfc20>] pep_sendmsg+0x60/0x380
[ 8921.792931] [<ffffffff824cb4a6>] ? pn_socket_bind+0x156/0x180
[ 8921.794917] [<ffffffff824cb50f>] ? pn_socket_autobind+0x3f/0x90
[ 8921.797053] [<ffffffff824cb63f>] pn_socket_sendmsg+0x4f/0x70
[ 8921.798992] [<ffffffff821ab8e7>] sock_aio_write+0x187/0x1b0
[ 8921.801395] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
[ 8921.803501] [<ffffffff8111842c>] ? __lock_acquire+0x42c/0x4b0
[ 8921.805505] [<ffffffff821ab760>] ? __sock_recv_ts_and_drops+0x140/0x140
[ 8921.807860] [<ffffffff811e07cc>] do_sync_readv_writev+0xbc/0x110
[ 8921.809986] [<ffffffff811958e7>] ? might_fault+0x97/0xa0
[ 8921.811998] [<ffffffff817bd99e>] ? security_file_permission+0x1e/0x90
[ 8921.814595] [<ffffffff811e17e2>] do_readv_writev+0xe2/0x1e0
[ 8921.816702] [<ffffffff810b8dac>] ? do_setitimer+0x1ac/0x200
[ 8921.818819] [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50
[ 8921.820863] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
[ 8921.823318] [<ffffffff811e1926>] vfs_writev+0x46/0x60
[ 8921.825219] [<ffffffff811e1a3f>] sys_writev+0x4f/0xb0
[ 8921.827127] [<ffffffff82658039>] system_call_fastpath+0x16/0x1b
[ 8921.829384] ---[ end trace dffe390f30db9eb7 ]---

Signed-off-by: Sasha Levin <[email protected]>
Acked-by: Rémi Denis-Courmont <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/phonet/pep.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index dc1e8ae81781..ca21189392fe 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -862,6 +862,9 @@ static int pep_sendmsg(struct kiocb *iocb, struct sock *sk,
int flags = msg->msg_flags;
int err, done;

+ if (len > USHORT_MAX)
+ return -EMSGSIZE;
+
if ((msg->msg_flags & ~(MSG_DONTWAIT|MSG_EOR|MSG_NOSIGNAL|
MSG_CMSG_COMPAT)) ||
!(msg->msg_flags & MSG_EOR))
--
1.8.5.2

2014-02-05 20:12:32

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 208/213] eCryptfs: Properly check for O_RDONLY flag before doing privileged open

From: Tyler Hicks <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 9fe79d7600497ed8a95c3981cbe5b73ab98222f0 upstream.

If the first attempt at opening the lower file read/write fails,
eCryptfs will retry using a privileged kthread. However, the privileged
retry should not happen if the lower file's inode is read-only because a
read/write open will still be unsuccessful.

The check for determining if the open should be retried was intended to
be based on the access mode of the lower file's open flags being
O_RDONLY, but the check was incorrectly performed. This would cause the
open to be retried by the privileged kthread, resulting in a second
failed open of the lower file. This patch corrects the check to
determine if the open request should be handled by the privileged
kthread.

Signed-off-by: Tyler Hicks <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Acked-by: Dan Carpenter <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ecryptfs/kthread.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ecryptfs/kthread.c b/fs/ecryptfs/kthread.c
index d8c3a373aafa..920d5d9a0cdb 100644
--- a/fs/ecryptfs/kthread.c
+++ b/fs/ecryptfs/kthread.c
@@ -149,7 +149,7 @@ int ecryptfs_privileged_open(struct file **lower_file,
(*lower_file) = dentry_open(lower_dentry, lower_mnt, flags, cred);
if (!IS_ERR(*lower_file))
goto out;
- if (flags & O_RDONLY) {
+ if ((flags & O_ACCMODE) == O_RDONLY) {
rc = PTR_ERR((*lower_file));
goto out;
}
--
1.8.5.2

2014-02-05 20:08:16

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 157/213] epoll: prevent missed events on EPOLL_CTL_MOD

From: Eric Wong <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 128dd1759d96ad36c379240f8b9463e8acfd37a1 upstream.

EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
ensure events are not missed. Since the modifications to the interest
mask are not protected by the same lock as ep_poll_callback, we need to
ensure the change is visible to other CPUs calling ep_poll_callback.

We also need to ensure f_op->poll() has an up-to-date view of past
events which occured before we modified the interest mask. So this
barrier also pairs with the barrier in wq_has_sleeper().

This should guarantee either ep_poll_callback or f_op->poll() (or both)
will notice the readiness of a recently-ready/modified item.

This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
http://thread.gmane.org/gmane.linux.kernel/1408782/

Signed-off-by: Eric Wong <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Davide Libenzi <[email protected]>
Cc: Hans de Goede <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: David Miller <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andreas Voellmy <[email protected]>
Tested-by: "Junchang(Jason) Wang" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/eventpoll.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 34ca5ca9c3e8..f8a6c0876a7a 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1033,10 +1033,30 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
* otherwise we might miss an event that happens between the
* f_op->poll() call and the new event set registering.
*/
- epi->event.events = event->events;
+ epi->event.events = event->events; /* need barrier below */
epi->event.data = event->data; /* protected by mtx */

/*
+ * The following barrier has two effects:
+ *
+ * 1) Flush epi changes above to other CPUs. This ensures
+ * we do not miss events from ep_poll_callback if an
+ * event occurs immediately after we call f_op->poll().
+ * We need this because we did not take ep->lock while
+ * changing epi above (but ep_poll_callback does take
+ * ep->lock).
+ *
+ * 2) We also need to ensure we do not miss _past_ events
+ * when calling f_op->poll(). This barrier also
+ * pairs with the barrier in wq_has_sleeper (see
+ * comments for wq_has_sleeper).
+ *
+ * This barrier will now guarantee ep_poll_callback or f_op->poll
+ * (or both) will notice the readiness of an item.
+ */
+ smp_mb();
+
+ /*
* Get current event bits. We can safely use the file* here because
* its usage count has been increased by the caller of this function.
*/
--
1.8.5.2

2014-02-05 20:12:56

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 207/213] fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)

From: Dan Williams <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 57fc2e335fd3c2f898ee73570dc81426c28dc7b4 upstream.

Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.

A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain. When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.

Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh. Since i/o that is released by scsi_restart_operations has been
blocked for at least one eh cycle, this implementation allows those
i/o's to run before another eh cycle starts to discourage hung task
timeouts.

Reported-by: Tom Jackson <[email protected]>
Tested-by: Tom Jackson <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/scsi/scsi_error.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 7ad53fa42766..3a56835c3ad3 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1614,6 +1614,20 @@ static void scsi_restart_operations(struct Scsi_Host *shost)
* requests are started.
*/
scsi_run_host_queues(shost);
+
+ /*
+ * if eh is active and host_eh_scheduled is pending we need to re-run
+ * recovery. we do this check after scsi_run_host_queues() to allow
+ * everything pent up since the last eh run a chance to make forward
+ * progress before we sync again. Either we'll immediately re-run
+ * recovery or scsi_device_unbusy() will wake us again when these
+ * pending commands complete.
+ */
+ spin_lock_irqsave(shost->host_lock, flags);
+ if (shost->host_eh_scheduled)
+ if (scsi_host_set_state(shost, SHOST_RECOVERY))
+ WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
+ spin_unlock_irqrestore(shost->host_lock, flags);
}

/**
--
1.8.5.2

2014-02-05 20:13:06

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 204/213] Avoid dangling pointer in scsi_requeue_command()

From: Bart Van Assche <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 940f5d47e2f2e1fa00443921a0abf4822335b54d upstream.

When we call scsi_unprep_request() the command associated with the request
gets destroyed and therefore drops its reference on the device. If this was
the only reference, the device may get released and we end up with a NULL
pointer deref when we call blk_requeue_request.

Reported-by: Mike Christie <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Reviewed-by: Tejun Heo <[email protected]>
[jejb: enhance commend and add commit log for stable]
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/scsi/scsi_lib.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6712297407bb..a7e6572940ef 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -484,15 +484,26 @@ static void scsi_run_queue(struct request_queue *q)
*/
static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
{
+ struct scsi_device *sdev = cmd->device;
struct request *req = cmd->request;
unsigned long flags;

+ /*
+ * We need to hold a reference on the device to avoid the queue being
+ * killed after the unlock and before scsi_run_queue is invoked which
+ * may happen because scsi_unprep_request() puts the command which
+ * releases its reference on the device.
+ */
+ get_device(&sdev->sdev_gendev);
+
spin_lock_irqsave(q->queue_lock, flags);
scsi_unprep_request(req);
blk_requeue_request(q, req);
spin_unlock_irqrestore(q->queue_lock, flags);

scsi_run_queue(q);
+
+ put_device(&sdev->sdev_gendev);
}

void scsi_next_command(struct scsi_cmnd *cmd)
--
1.8.5.2

2014-02-05 20:13:03

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 201/213] futex: Fix bug in WARN_ON for NULL q.pi_state

From: Darren Hart <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit f27071cb7fe3e1d37a9dbe6c0dfc5395cd40fa43 upstream.

The WARN_ON in futex_wait_requeue_pi() for a NULL q.pi_state was testing
the address (&q.pi_state) of the pointer instead of the value
(q.pi_state) of the pointer. Correct it accordingly.

Signed-off-by: Darren Hart <[email protected]>
Cc: Dave Jones <[email protected]>
Link: http://lkml.kernel.org/r/1c85d97f6e5f79ec389a4ead3e367363c74bd09a.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/futex.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 4a8f72850152..d4e7f0ea1f94 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2321,7 +2321,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
* signal. futex_unlock_pi() will not destroy the lock_ptr nor
* the pi_state.
*/
- WARN_ON(!&q.pi_state);
+ WARN_ON(!q.pi_state);
pi_mutex = &q.pi_state->pi_mutex;
ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
debug_rt_mutex_free_waiter(&rt_waiter);
--
1.8.5.2

2014-02-05 20:12:54

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 205/213] libsas: continue revalidation

From: Dan Williams <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 26f2f199ff150d8876b2641c41e60d1c92d2fb81 upstream.

Continue running revalidation until no more broadcast devices are
discovered. Fixes cases where re-discovery completes too early in a
domain with multiple expanders with pending re-discovery events.
Servicing BCNs can get backed up behind error recovery.

Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/scsi/libsas/sas_expander.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 4ee42bb48dcd..cb4964b54191 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1947,9 +1947,7 @@ int sas_ex_revalidate_domain(struct domain_device *port_dev)
struct domain_device *dev = NULL;

res = sas_find_bcast_dev(port_dev, &dev);
- if (res)
- goto out;
- if (dev) {
+ while (res == 0 && dev) {
struct expander_device *ex = &dev->ex_dev;
int i = 0, phy_id;

@@ -1961,8 +1959,10 @@ int sas_ex_revalidate_domain(struct domain_device *port_dev)
res = sas_rediscover(dev, phy_id);
i = phy_id + 1;
} while (i < ex->num_phys);
+
+ dev = NULL;
+ res = sas_find_bcast_dev(port_dev, &dev);
}
-out:
return res;
}

--
1.8.5.2

2014-02-05 20:12:51

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 200/213] futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()

From: Darren Hart <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef upstream.

If uaddr == uaddr2, then we have broken the rule of only requeueing
from a non-pi futex to a pi futex with this call. If we attempt this,
as the trinity test suite manages to do, we miss early wakeups as
q.key is equal to key2 (because they are the same uaddr). We will then
attempt to dereference the pi_mutex (which would exist had the futex_q
been properly requeued to a pi futex) and trigger a NULL pointer
dereference.

Signed-off-by: Darren Hart <[email protected]>
Cc: Dave Jones <[email protected]>
Link: http://lkml.kernel.org/r/ad82bfe7f7d130247fbe2b5b4275654807774227.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/futex.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 8b467b4a437f..4a8f72850152 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2204,11 +2204,11 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
* @uaddr2: the pi futex we will take prior to returning to user-space
*
* The caller will wait on uaddr and will be requeued by futex_requeue() to
- * uaddr2 which must be PI aware. Normal wakeup will wake on uaddr2 and
- * complete the acquisition of the rt_mutex prior to returning to userspace.
- * This ensures the rt_mutex maintains an owner when it has waiters; without
- * one, the pi logic wouldn't know which task to boost/deboost, if there was a
- * need to.
+ * uaddr2 which must be PI aware and unique from uaddr. Normal wakeup will wake
+ * on uaddr2 and complete the acquisition of the rt_mutex prior to returning to
+ * userspace. This ensures the rt_mutex maintains an owner when it has waiters;
+ * without one, the pi logic would not know which task to boost/deboost, if
+ * there was a need to.
*
* We call schedule in futex_wait_queue_me() when we enqueue and return there
* via the following:
@@ -2245,6 +2245,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
struct futex_q q;
int res, ret;

+ if (uaddr == uaddr2)
+ return -EINVAL;
+
if (!bitset)
return -EINVAL;

--
1.8.5.2

2014-02-05 20:08:11

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 178/213] ext4: lock i_mutex when truncating orphan inodes

From: Theodore Ts'o <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 721e3eba21e43532e438652dd8f1fcdfce3187e7 upstream.

Commit c278531d39 added a warning when ext4_flush_unwritten_io() is
called without i_mutex being taken. It had previously not been taken
during orphan cleanup since races weren't possible at that point in
the mount process, but as a result of this c278531d39, we will now see
a kernel WARN_ON in this case. Take the i_mutex in
ext4_orphan_cleanup() to suppress this warning.

Reported-by: Alexander Beregalov <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Reviewed-by: Zheng Liu <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/super.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6928d5ad2c0d..90906948e242 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2028,7 +2028,9 @@ static void ext4_orphan_cleanup(struct super_block *sb,
__func__, inode->i_ino, inode->i_size);
jbd_debug(2, "truncating inode %lu to %lld bytes\n",
inode->i_ino, inode->i_size);
+ mutex_lock(&inode->i_mutex);
ext4_truncate(inode);
+ mutex_unlock(&inode->i_mutex);
nr_truncates++;
} else {
ext4_msg(sb, KERN_DEBUG,
--
1.8.5.2

2014-02-05 20:15:01

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 197/213] vfs: missed source of ->f_pos races

From: Al Viro <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 0e665d5d1125f9f4ccff56a75e814f10f88861a2 upstream.

compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing
as sys_{read,write}{,v}().

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/compat.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index 633e63c32aa7..388555d404bf 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1231,11 +1231,14 @@ compat_sys_readv(unsigned long fd, const struct compat_iovec __user *vec,
struct file *file;
int fput_needed;
ssize_t ret;
+ loff_t pos;

file = fget_light(fd, &fput_needed);
if (!file)
return -EBADF;
- ret = compat_readv(file, vec, vlen, &file->f_pos);
+ pos = file->f_pos;
+ ret = compat_readv(file, vec, vlen, &pos);
+ file->f_pos = pos;
fput_light(file, fput_needed);
return ret;
}
@@ -1288,11 +1291,14 @@ compat_sys_writev(unsigned long fd, const struct compat_iovec __user *vec,
struct file *file;
int fput_needed;
ssize_t ret;
+ loff_t pos;

file = fget_light(fd, &fput_needed);
if (!file)
return -EBADF;
- ret = compat_writev(file, vec, vlen, &file->f_pos);
+ pos = file->f_pos;
+ ret = compat_writev(file, vec, vlen, &pos);
+ file->f_pos = pos;
fput_light(file, fput_needed);
return ret;
}
--
1.8.5.2

2014-02-05 20:08:00

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 169/213] ext3: Fix fdatasync() for files with only i_size changes

From: Jan Kara <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 156bddd8e505b295540f3ca0e27dda68cb0d49aa upstream.

Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.

Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.

Reported-by: Kristian Nielsen <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext3/inode.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index ea33bdf0a300..f841730b751e 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2959,6 +2959,8 @@ static int ext3_do_update_inode(handle_t *handle,
struct ext3_inode_info *ei = EXT3_I(inode);
struct buffer_head *bh = iloc->bh;
int err = 0, rc, block;
+ int need_datasync = 0;
+ __le32 disksize;

again:
/* we can't allow multiple procs in here at once, its a bit racey */
@@ -2996,7 +2998,11 @@ again:
raw_inode->i_gid_high = 0;
}
raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
- raw_inode->i_size = cpu_to_le32(ei->i_disksize);
+ disksize = cpu_to_le32(ei->i_disksize);
+ if (disksize != raw_inode->i_size) {
+ need_datasync = 1;
+ raw_inode->i_size = disksize;
+ }
raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec);
raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec);
raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec);
@@ -3012,8 +3018,11 @@ again:
if (!S_ISREG(inode->i_mode)) {
raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl);
} else {
- raw_inode->i_size_high =
- cpu_to_le32(ei->i_disksize >> 32);
+ disksize = cpu_to_le32(ei->i_disksize >> 32);
+ if (disksize != raw_inode->i_size_high) {
+ raw_inode->i_size_high = disksize;
+ need_datasync = 1;
+ }
if (ei->i_disksize > 0x7fffffffULL) {
struct super_block *sb = inode->i_sb;
if (!EXT3_HAS_RO_COMPAT_FEATURE(sb,
@@ -3066,6 +3075,8 @@ again:
ext3_clear_inode_state(inode, EXT3_STATE_NEW);

atomic_set(&ei->i_sync_tid, handle->h_transaction->t_tid);
+ if (need_datasync)
+ atomic_set(&ei->i_datasync_tid, handle->h_transaction->t_tid);
out_brelse:
brelse (bh);
ext3_std_error(inode->i_sb, err);
--
1.8.5.2

2014-02-05 20:15:36

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 192/213] mtd: cafe_nand: fix an & vs | mistake

From: Dan Carpenter <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 48f8b641297df49021093763a3271119a84990a2 upstream.

The intent here was clearly to set result to true if the 0x40000000 flag
was set. But instead there was a | vs & typo and we always set result
to true.

Artem: check the spec at
wiki.laptop.org/images/5/5c/88ALP01_Datasheet_July_2007.pdf
and this fix looks correct.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/mtd/nand/cafe_nand.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mtd/nand/cafe_nand.c b/drivers/mtd/nand/cafe_nand.c
index e5a9f9ccea60..882d01910804 100644
--- a/drivers/mtd/nand/cafe_nand.c
+++ b/drivers/mtd/nand/cafe_nand.c
@@ -104,7 +104,7 @@ static const char *part_probes[] = { "cmdlinepart", "RedBoot", NULL };
static int cafe_device_ready(struct mtd_info *mtd)
{
struct cafe_priv *cafe = mtd->priv;
- int result = !!(cafe_readl(cafe, NAND_STATUS) | 0x40000000);
+ int result = !!(cafe_readl(cafe, NAND_STATUS) & 0x40000000);
uint32_t irqs = cafe_readl(cafe, NAND_IRQ);

cafe_writel(cafe, irqs, NAND_IRQ);
--
1.8.5.2

2014-02-05 20:15:32

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 196/213] svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping

From: "J. Bruce Fields" <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit d10f27a750312ed5638c876e4bd6aa83664cccd8 upstream.

The rpc server tries to ensure that there will be room to send a reply
before it receives a request.

It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.

Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available. If it finds that there is not
space, it then subtracts the estimate back out.

This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.

The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.

Reported-by: Michael Tokarev <[email protected]>
Tested-by: Michael Tokarev <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/sunrpc/svc_xprt.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 957a7e88e827..afa0bceb67ad 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -310,7 +310,6 @@ static void svc_thread_dequeue(struct svc_pool *pool, struct svc_rqst *rqstp)
*/
void svc_xprt_enqueue(struct svc_xprt *xprt)
{
- struct svc_serv *serv = xprt->xpt_server;
struct svc_pool *pool;
struct svc_rqst *rqstp;
int cpu;
@@ -384,8 +383,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
rqstp, rqstp->rq_xprt);
rqstp->rq_xprt = xprt;
svc_xprt_get(xprt);
- rqstp->rq_reserved = serv->sv_max_mesg;
- atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
pool->sp_stats.threads_woken++;
BUG_ON(xprt->xpt_pool != pool);
wake_up(&rqstp->rq_wait);
@@ -663,8 +660,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
if (xprt) {
rqstp->rq_xprt = xprt;
svc_xprt_get(xprt);
- rqstp->rq_reserved = serv->sv_max_mesg;
- atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
} else {
/* No data pending. Go to sleep */
svc_thread_enqueue(pool, rqstp);
@@ -754,6 +749,8 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
} else
len = xprt->xpt_ops->xpo_recvfrom(rqstp);
dprintk("svc: got len=%d\n", len);
+ rqstp->rq_reserved = serv->sv_max_mesg;
+ atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
}

/* No data, incomplete (TCP) read, or accept() */
--
1.8.5.2

2014-02-05 20:16:19

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 194/213] Remove user-triggerable BUG from mpol_to_str

From: Dave Jones <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 80de7c3138ee9fd86a98696fd2cf7ad89b995d0a upstream.

Trivially triggerable, found by trinity:

kernel BUG at mm/mempolicy.c:2546!
Process trinity-child2 (pid: 23988, threadinfo ffff88010197e000, task ffff88007821a670)
Call Trace:
show_numa_map+0xd5/0x450
show_pid_numa_map+0x13/0x20
traverse+0xf2/0x230
seq_read+0x34b/0x3e0
vfs_read+0xac/0x180
sys_pread64+0xa2/0xc0
system_call_fastpath+0x1a/0x1f
RIP: mpol_to_str+0x156/0x360

Signed-off-by: Dave Jones <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
mm/mempolicy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index ae43da3aff5a..1d5c89a7e128 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2330,7 +2330,7 @@ int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
break;

default:
- BUG();
+ return -EINVAL;
}

l = strlen(policy_types[mode]);
--
1.8.5.2

2014-02-05 20:16:22

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 195/213] svcrpc: sends on closed socket should stop immediately

From: "J. Bruce Fields" <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit f06f00a24d76e168ecb38d352126fd203937b601 upstream.

svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately. Meanwhile other
threads could send further replies before the socket is really shut
down. This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.

Symptoms were data corruption preceded by svc_tcp_sendto logging
something like

kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket

Reported-by: Malahal Naineni <[email protected]>
Tested-by: Malahal Naineni <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/sunrpc/svc_xprt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 33df29bd8c61..957a7e88e827 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -807,7 +807,8 @@ int svc_send(struct svc_rqst *rqstp)

/* Grab mutex to serialize outgoing data. */
mutex_lock(&xprt->xpt_mutex);
- if (test_bit(XPT_DEAD, &xprt->xpt_flags))
+ if (test_bit(XPT_DEAD, &xprt->xpt_flags)
+ || test_bit(XPT_CLOSE, &xprt->xpt_flags))
len = -ENOTCONN;
else
len = xprt->xpt_ops->xpo_sendto(rqstp);
--
1.8.5.2

2014-02-05 20:16:17

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 190/213] udf: fix retun value on error path in udf_load_logicalvol

From: Nikola Pajkovsky <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 68766a2edcd5cd744262a70a2f67a320ac944760 upstream.

In case we detect a problem and bail out, we fail to set "ret" to a
nonzero value, and udf_load_logicalvol will mistakenly report success.

Signed-off-by: Nikola Pajkovsky <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/udf/super.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/udf/super.c b/fs/udf/super.c
index 5ece6d6721f8..325d4d6856b1 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -1314,6 +1314,7 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
"error loading logical volume descriptor: "
"Partition table too long (%u > %lu)\n", table_len,
sb->s_blocksize - sizeof(*lvd));
+ ret = 1;
goto out_bh;
}

@@ -1360,8 +1361,10 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
UDF_ID_SPARABLE,
strlen(UDF_ID_SPARABLE))) {
if (udf_load_sparable_map(sb, map,
- (struct sparablePartitionMap *)gpm) < 0)
+ (struct sparablePartitionMap *)gpm) < 0) {
+ ret = 1;
goto out_bh;
+ }
} else if (!strncmp(upm2->partIdent.ident,
UDF_ID_METADATA,
strlen(UDF_ID_METADATA))) {
--
1.8.5.2

2014-02-05 20:07:56

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 170/213] ext3: Fix error handling on inode bitmap corruption

From: Jan Kara <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 1415dd8705394399d59a3df1ab48d149e1e41e77 upstream.

When insert_inode_locked() fails in ext3_new_inode() it most likely
means inode bitmap got corrupted and we allocated again inode which
is already in use. Also doing unlock_new_inode() during error recovery
is wrong since inode does not have I_NEW set. Fix the problem by jumping
to fail: (instead of fail_drop:) which declares filesystem error and
does not call unlock_new_inode().

Reviewed-by: Eric Sandeen <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext3/ialloc.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
index 0d0e97ed3ff6..fd16a928574c 100644
--- a/fs/ext3/ialloc.c
+++ b/fs/ext3/ialloc.c
@@ -575,8 +575,12 @@ got:
if (IS_DIRSYNC(inode))
handle->h_sync = 1;
if (insert_inode_locked(inode) < 0) {
- err = -EINVAL;
- goto fail_drop;
+ /*
+ * Likely a bitmap corruption causing inode to be allocated
+ * twice.
+ */
+ err = -EIO;
+ goto fail;
}
spin_lock(&sbi->s_next_gen_lock);
inode->i_generation = sbi->s_next_generation++;
--
1.8.5.2

2014-02-05 20:07:53

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 165/213] jbd: Fix lock ordering bug in journal_unmap_buffer()

From: Jan Kara <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 25389bb207987b5774182f763b9fb65ff08761c8 upstream.

Commit 09e05d48 introduced a wait for transaction commit into
journal_unmap_buffer() in the case we are truncating a buffer undergoing commit
in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly
we forgot to drop buffer lock before waiting for transaction commit and thus
deadlock is possible when kjournald wants to lock the buffer.

Fix the problem by dropping the buffer lock before waiting for transaction
commit. Since we are still holding page lock (and that is OK), buffer cannot
disappear under us.

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd/transaction.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index bc8ab97dcd90..590e23885c98 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1956,7 +1956,9 @@ retry:
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
spin_unlock(&journal->j_state_lock);
+ unlock_buffer(bh);
log_wait_commit(journal, tid);
+ lock_buffer(bh);
goto retry;
}
/*
--
1.8.5.2

2014-02-05 20:17:38

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 189/213] udf: Fix data corruption for files in ICB

From: Jan Kara <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 9c2fc0de1a6e638fe58c354a463f544f42a90a09 upstream.

When a file is stored in ICB (inode), we overwrite part of the file, and
the page containing file's data is not in page cache, we end up corrupting
file's data by overwriting them with zeros. The problem is we use
simple_write_begin() which simply zeroes parts of the page which are not
written to. The problem has been introduced by be021ee4 (udf: convert to
new aops).

Fix the problem by providing a ->write_begin function which makes the page
properly uptodate.

Reported-by: Ian Abbott <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/udf/file.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/fs/udf/file.c b/fs/udf/file.c
index 4b6a46ccbf46..af6f30ac228f 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -41,20 +41,24 @@
#include "udf_i.h"
#include "udf_sb.h"

-static int udf_adinicb_readpage(struct file *file, struct page *page)
+static void __udf_adinicb_readpage(struct page *page)
{
struct inode *inode = page->mapping->host;
char *kaddr;
struct udf_inode_info *iinfo = UDF_I(inode);

- BUG_ON(!PageLocked(page));
-
kaddr = kmap(page);
- memset(kaddr, 0, PAGE_CACHE_SIZE);
memcpy(kaddr, iinfo->i_ext.i_data + iinfo->i_lenEAttr, inode->i_size);
+ memset(kaddr + inode->i_size, 0, PAGE_CACHE_SIZE - inode->i_size);
flush_dcache_page(page);
SetPageUptodate(page);
kunmap(page);
+}
+
+static int udf_adinicb_readpage(struct file *file, struct page *page)
+{
+ BUG_ON(!PageLocked(page));
+ __udf_adinicb_readpage(page);
unlock_page(page);

return 0;
@@ -79,6 +83,25 @@ static int udf_adinicb_writepage(struct page *page,
return 0;
}

+static int udf_adinicb_write_begin(struct file *file,
+ struct address_space *mapping, loff_t pos,
+ unsigned len, unsigned flags, struct page **pagep,
+ void **fsdata)
+{
+ struct page *page;
+
+ if (WARN_ON_ONCE(pos >= PAGE_CACHE_SIZE))
+ return -EIO;
+ page = grab_cache_page_write_begin(mapping, 0, flags);
+ if (!page)
+ return -ENOMEM;
+ *pagep = page;
+
+ if (!PageUptodate(page) && len != PAGE_CACHE_SIZE)
+ __udf_adinicb_readpage(page);
+ return 0;
+}
+
static int udf_adinicb_write_end(struct file *file,
struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
@@ -101,8 +124,8 @@ const struct address_space_operations udf_adinicb_aops = {
.readpage = udf_adinicb_readpage,
.writepage = udf_adinicb_writepage,
.sync_page = block_sync_page,
- .write_begin = simple_write_begin,
- .write_end = udf_adinicb_write_end,
+ .write_begin = udf_adinicb_write_begin,
+ .write_end = udf_adinicb_write_end,
};

static ssize_t udf_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
--
1.8.5.2

2014-02-05 20:17:37

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 186/213] udf: fix memory leak while allocating blocks during write

From: Namjae Jeon <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 2fb7d99d0de3fd8ae869f35ab682581d8455887a upstream.

Need to brelse the buffer_head stored in cur_epos and next_epos.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/udf/inode.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 8a3fbd177cab..fe64cf54b11e 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -654,6 +654,8 @@ static struct buffer_head *inode_getblk(struct inode *inode, sector_t block,
goal, err);
if (!newblocknum) {
brelse(prev_epos.bh);
+ brelse(cur_epos.bh);
+ brelse(next_epos.bh);
*err = -ENOSPC;
return NULL;
}
@@ -684,6 +686,8 @@ static struct buffer_head *inode_getblk(struct inode *inode, sector_t block,
udf_update_extents(inode, laarr, startnum, endnum, &prev_epos);

brelse(prev_epos.bh);
+ brelse(cur_epos.bh);
+ brelse(next_epos.bh);

newblock = udf_get_pblock(inode->i_sb, newblocknum,
iinfo->i_location.partitionReferenceNum, 0);
--
1.8.5.2

2014-02-05 20:18:22

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 188/213] udf: Fix bitmap overflow on large filesystems with small block size

From: Jan Kara <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 89b1f39eb4189de745fae554b0d614d87c8d5c63 upstream.

For large UDF filesystems with 512-byte blocks the number of necessary
bitmap blocks is larger than 2^16 so s_nr_groups in udf_bitmap overflows
(the number will overflow for filesystems larger than 128 GB with
512-byte blocks). That results in ENOSPC errors despite the filesystem
has plenty of free space.

Fix the problem by changing s_nr_groups' type to 'int'. That is enough
even for filesystems 2^32 blocks (UDF maximum) and 512-byte blocksize.

Reported-and-tested-by: [email protected]
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/udf/udf_sb.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
index d113b72c2768..efa82c91ec18 100644
--- a/fs/udf/udf_sb.h
+++ b/fs/udf/udf_sb.h
@@ -78,7 +78,7 @@ struct udf_virtual_data {
struct udf_bitmap {
__u32 s_extLength;
__u32 s_extPosition;
- __u16 s_nr_groups;
+ int s_nr_groups;
struct buffer_head **s_block_bitmap;
};

--
1.8.5.2

2014-02-05 20:18:19

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 187/213] udf: avoid info leak on export

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 0143fc5e9f6f5aad4764801015bc8d4b4a278200 upstream.

For type 0x51 the udf.parent_partref member in struct fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/udf/namei.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 75816025f95f..919fa1e5f761 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1366,6 +1366,7 @@ static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
*lenp = 3;
fid->udf.block = location.logicalBlockNum;
fid->udf.partref = location.partitionReferenceNum;
+ fid->udf.parent_partref = 0;
fid->udf.generation = inode->i_generation;

if (connectable && !S_ISDIR(inode->i_mode)) {
--
1.8.5.2

2014-02-05 20:07:46

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 163/213] isofs: avoid info leak on export

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit fe685aabf7c8c9f138e5ea900954d295bf229175 upstream.

For type 1 the parent_offset member in struct isofs_fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/isofs/export.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/isofs/export.c b/fs/isofs/export.c
index ed752cb38474..344aa606eecd 100644
--- a/fs/isofs/export.c
+++ b/fs/isofs/export.c
@@ -131,6 +131,7 @@ isofs_export_encode_fh(struct dentry *dentry,
len = 3;
fh32[0] = ei->i_iget5_block;
fh16[2] = (__u16)ei->i_iget5_offset; /* fh16 [sic] */
+ fh16[3] = 0; /* avoid leaking uninitialized data */
fh32[2] = inode->i_generation;
if (connectable && !S_ISDIR(inode->i_mode)) {
struct inode *parent;
--
1.8.5.2

2014-02-05 20:19:06

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 185/213] Btrfs: call the ordered free operation without any locks held

From: Chris Mason <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit e9fbcb42201c862fd6ab45c48ead4f47bb2dea9d upstream.

Each ordered operation has a free callback, and this was called with the
worker spinlock held. Josef made the free callback also call iput,
which we can't do with the spinlock.

This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list. We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.

Signed-off-by: Chris Mason <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/btrfs/async-thread.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 462859a30141..474d1b82e38c 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -212,10 +212,17 @@ static noinline int run_ordered_completions(struct btrfs_workers *workers,

work->ordered_func(work);

- /* now take the lock again and call the freeing code */
+ /* now take the lock again and drop our item from the list */
spin_lock(&workers->order_lock);
list_del(&work->order_list);
+ spin_unlock(&workers->order_lock);
+
+ /*
+ * we don't want to call the ordered free functions
+ * with the lock held though
+ */
work->ordered_free(work);
+ spin_lock(&workers->order_lock);
}

spin_unlock(&workers->order_lock);
--
1.8.5.2

2014-02-05 20:19:03

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 183/213] ext4: fix error handling on inode bitmap corruption

From: Jan Kara <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit acd6ad83517639e8f09a8c5525b1dccd81cd2a10 upstream.

When insert_inode_locked() fails in ext4_new_inode() it most likely means inode
bitmap got corrupted and we allocated again inode which is already in use. Also
doing unlock_new_inode() during error recovery is wrong since the inode does
not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:)
which declares filesystem error and does not call unlock_new_inode().

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/ialloc.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 7f6b5826d5a6..4783c5a4adac 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1009,8 +1009,12 @@ got:
if (IS_DIRSYNC(inode))
ext4_handle_sync(handle);
if (insert_inode_locked(inode) < 0) {
- err = -EINVAL;
- goto fail_drop;
+ /*
+ * Likely a bitmap corruption causing inode to be allocated
+ * twice.
+ */
+ err = -EIO;
+ goto fail;
}
spin_lock(&sbi->s_next_gen_lock);
inode->i_generation = sbi->s_next_generation++;
--
1.8.5.2

2014-02-05 20:18:58

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 184/213] btrfs: use rcu_barrier() to wait for bdev puts at unmount

From: Eric Sandeen <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit bc178622d40d87e75abc131007342429c9b03351 upstream.

Doing this would reliably fail with -EBUSY for me:

# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy

because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.

Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:

btrfs_close_devices
__btrfs_close_devices
call_rcu(&device->rcu, free_device);
free_device
INIT_WORK(&device->rcu_work, __free_device);
schedule_work(&device->rcu_work);

so unmount might complete before __free_device fires & does its blkdev_put.

Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.

Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/btrfs/volumes.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c04ebb1e4c98..7f75546941ac 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -565,6 +565,12 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
__btrfs_close_devices(fs_devices);
free_fs_devices(fs_devices);
}
+ /*
+ * Wait for rcu kworkers under __btrfs_close_devices
+ * to finish all blkdev_puts so device is really
+ * free when umount is done.
+ */
+ rcu_barrier();
return ret;
}

--
1.8.5.2

2014-02-05 20:19:57

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 182/213] ext4: avoid hang when mounting non-journal filesystems with orphan list

From: Theodore Ts'o <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 0e9a9a1ad619e7e987815d20262d36a2f95717ca upstream.

When trying to mount a file system which does not contain a journal,
but which does have a orphan list containing an inode which needs to
be truncated, the mount call with hang forever in
ext4_orphan_cleanup() because ext4_orphan_del() will return
immediately without removing the inode from the orphan list, leading
to an uninterruptible loop in kernel code which will busy out one of
the CPU's on the system.

This can be trivially reproduced by trying to mount the file system
found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs
source tree. If a malicious user were to put this on a USB stick, and
mount it on a Linux desktop which has automatic mounts enabled, this
could be considered a potential denial of service attack. (Not a big
deal in practice, but professional paranoids worry about such things,
and have even been known to allocate CVE numbers for such problems.)

Signed-off-by: "Theodore Ts'o" <[email protected]>
Reviewed-by: Zheng Liu <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/namei.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index d64e5f4f12ed..f501bdf9d4c1 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2081,7 +2081,8 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
struct ext4_iloc iloc;
int err = 0;

- if (!EXT4_SB(inode->i_sb)->s_journal)
+ if ((!EXT4_SB(inode->i_sb)->s_journal) &&
+ !(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ORPHAN_FS))
return 0;

mutex_lock(&EXT4_SB(inode->i_sb)->s_orphan_lock);
--
1.8.5.2

2014-02-05 20:19:55

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 181/213] ext4: make orphan functions be no-op in no-journal mode

From: Anatol Pomozov <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c9b92530a723ac5ef8e352885a1862b18f31b2f5 upstream.

Instead of checking whether the handle is valid, we check if journal
is enabled. This avoids taking the s_orphan_lock mutex in all cases
when there is no journal in use, including the error paths where
ext4_orphan_del() is called with a handle set to NULL.

Signed-off-by: Anatol Pomozov <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/namei.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 683c0f9d8a83..d64e5f4f12ed 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2000,7 +2000,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode)
struct ext4_iloc iloc;
int err = 0, rc;

- if (!ext4_handle_valid(handle))
+ if (!EXT4_SB(sb)->s_journal)
return 0;

mutex_lock(&EXT4_SB(sb)->s_orphan_lock);
@@ -2081,8 +2081,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
struct ext4_iloc iloc;
int err = 0;

- /* ext4_handle_valid() assumes a valid handle_t pointer */
- if (handle && !ext4_handle_valid(handle))
+ if (!EXT4_SB(inode->i_sb)->s_journal)
return 0;

mutex_lock(&EXT4_SB(inode->i_sb)->s_orphan_lock);
@@ -2101,7 +2100,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
* transaction handle with which to update the orphan list on
* disk, but we still need to remove the inode from the linked
* list in memory. */
- if (sbi->s_journal && !handle)
+ if (!handle)
goto out;

err = ext4_reserve_inode_write(handle, inode, &iloc);
--
1.8.5.2

2014-02-05 20:19:54

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 180/213] ext4: limit group search loop for non-extent files

From: Lachlan McIlroy <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit e6155736ad76b2070652745f9e54cdea3f0d8567 upstream.

In the case where we are allocating for a non-extent file,
we must limit the groups we allocate from to those below
2^32 blocks, and ext4_mb_regular_allocator() attempts to
do this initially by putting a cap on ngroups for the
subsequent search loop.

However, the initial target group comes in from the
allocation context (ac), and it may already be beyond
the artificially limited ngroups. In this case,
the limit

if (group == ngroups)
group = 0;

at the top of the loop is never true, and the loop will
run away.

Catch this case inside the loop and reset the search to
start at group 0.

[[email protected]: add commit msg & comments]

Signed-off-by: Lachlan McIlroy <[email protected]>
Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/mballoc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5e440caf82de..ac7889907361 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2068,7 +2068,11 @@ repeat:
group = ac->ac_g_ex.fe_group;

for (i = 0; i < ngroups; group++, i++) {
- if (group == ngroups)
+ /*
+ * Artificially restricted ngroups for non-extent
+ * files makes group > ngroups possible on first loop.
+ */
+ if (group >= ngroups)
group = 0;

/* This now checks without needing the buddy page */
--
1.8.5.2

2014-02-05 20:21:17

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 179/213] ext4: fix race in ext4_mb_add_n_trim()

From: Niu Yawei <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit f1167009711032b0d747ec89a632a626c901a1ad upstream.

In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when
changing the lg_prealloc_list.

Signed-off-by: Niu Yawei <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/mballoc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f1c9a84c50a3..5e440caf82de 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4153,7 +4153,7 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac)
/* The max size of hash table is PREALLOC_TB_SIZE */
order = PREALLOC_TB_SIZE - 1;
/* Add the prealloc space to lg */
- rcu_read_lock();
+ spin_lock(&lg->lg_prealloc_lock);
list_for_each_entry_rcu(tmp_pa, &lg->lg_prealloc_list[order],
pa_inode_list) {
spin_lock(&tmp_pa->pa_lock);
@@ -4177,12 +4177,12 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac)
if (!added)
list_add_tail_rcu(&pa->pa_inode_list,
&lg->lg_prealloc_list[order]);
- rcu_read_unlock();
+ spin_unlock(&lg->lg_prealloc_lock);

/* Now trim the list to be not more than 8 elements */
if (lg_prealloc_count > 8) {
ext4_mb_discard_lg_preallocations(sb, lg,
- order, lg_prealloc_count);
+ order, lg_prealloc_count);
return;
}
return ;
--
1.8.5.2

2014-02-05 20:21:15

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 176/213] ext4: always set i_op in ext4_mknod()

From: Bernd Schubert <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream.

ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
to mask those methods. And ext4_iget also always sets it, so there is
an inconsistency.

Signed-off-by: Bernd Schubert <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/namei.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 41198b355a26..683c0f9d8a83 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1825,9 +1825,7 @@ retry:
err = PTR_ERR(inode);
if (!IS_ERR(inode)) {
init_special_inode(inode, inode->i_mode, rdev);
-#ifdef CONFIG_EXT4_FS_XATTR
inode->i_op = &ext4_special_inode_operations;
-#endif
err = ext4_add_nondir(handle, dentry, inode);
}
ext4_journal_stop(handle);
--
1.8.5.2

2014-02-05 20:07:40

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 154/213] USB: cdc-wdm: fix lockup on error in wdm_read

From: Bjørn Mork <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit b086b6b10d9f182cd8d2f0dcfd7fd11edba93fc9 upstream.

Clear the WDM_READ flag on empty reads to avoid running
forever in an infinite tight loop, causing lockups:

Jul 1 21:58:11 nemi kernel: [ 3658.898647] qmi_wwan 2-1:1.2: Unexpected error -71
Jul 1 21:58:36 nemi kernel: [ 3684.072021] BUG: soft lockup - CPU#0 stuck for 23s! [qmi.pl:12235]
Jul 1 21:58:36 nemi kernel: [ 3684.072212] CPU 0
Jul 1 21:58:36 nemi kernel: [ 3684.072355]
Jul 1 21:58:36 nemi kernel: [ 3684.072367] Pid: 12235, comm: qmi.pl Tainted: P O 3.5.0-rc2+ #13 LENOVO 2776LEG/2776LEG
Jul 1 21:58:36 nemi kernel: [ 3684.072383] RIP: 0010:[<ffffffffa0635008>] [<ffffffffa0635008>] spin_unlock_irq+0x8/0xc [cdc_wdm]
Jul 1 21:58:36 nemi kernel: [ 3684.072388] RSP: 0018:ffff88022dca1e70 EFLAGS: 00000282
Jul 1 21:58:36 nemi kernel: [ 3684.072393] RAX: ffff88022fc3f650 RBX: ffffffff811c56f7 RCX: 00000001000ce8c1
Jul 1 21:58:36 nemi kernel: [ 3684.072398] RDX: 0000000000000010 RSI: 000000000267d810 RDI: ffff88022fc3f650
Jul 1 21:58:36 nemi kernel: [ 3684.072403] RBP: ffff88022dca1eb0 R08: ffffffffa063578e R09: 0000000000000000
Jul 1 21:58:36 nemi kernel: [ 3684.072407] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000002
Jul 1 21:58:36 nemi kernel: [ 3684.072412] R13: 0000000000000246 R14: ffffffff00000002 R15: ffff8802281d8c88
Jul 1 21:58:36 nemi kernel: [ 3684.072418] FS: 00007f666a260700(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
Jul 1 21:58:36 nemi kernel: [ 3684.072423] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 1 21:58:36 nemi kernel: [ 3684.072428] CR2: 000000000270d9d8 CR3: 000000022e865000 CR4: 00000000000007f0
Jul 1 21:58:36 nemi kernel: [ 3684.072433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 1 21:58:36 nemi kernel: [ 3684.072438] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 1 21:58:36 nemi kernel: [ 3684.072444] Process qmi.pl (pid: 12235, threadinfo ffff88022dca0000, task ffff88022ff76380)
Jul 1 21:58:36 nemi kernel: [ 3684.072448] Stack:
Jul 1 21:58:36 nemi kernel: [ 3684.072458] ffffffffa063592e 0000000100020000 ffff88022fc3f650 ffff88022fc3f6a8
Jul 1 21:58:36 nemi kernel: [ 3684.072466] 0000000000000200 0000000100000000 000000000267d810 0000000000000000
Jul 1 21:58:36 nemi kernel: [ 3684.072475] 0000000000000000 ffff880212cfb6d0 0000000000000200 ffff880212cfb6c0
Jul 1 21:58:36 nemi kernel: [ 3684.072479] Call Trace:
Jul 1 21:58:36 nemi kernel: [ 3684.072489] [<ffffffffa063592e>] ? wdm_read+0x1a0/0x263 [cdc_wdm]
Jul 1 21:58:36 nemi kernel: [ 3684.072500] [<ffffffff8110adb7>] ? vfs_read+0xa1/0xfb
Jul 1 21:58:36 nemi kernel: [ 3684.072509] [<ffffffff81040589>] ? alarm_setitimer+0x35/0x64
Jul 1 21:58:36 nemi kernel: [ 3684.072517] [<ffffffff8110aec7>] ? sys_read+0x45/0x6e
Jul 1 21:58:36 nemi kernel: [ 3684.072525] [<ffffffff813725f9>] ? system_call_fastpath+0x16/0x1b
Jul 1 21:58:36 nemi kernel: [ 3684.072557] Code: <66> 66 90 c3 83 ff ed 89 f8 74 16 7f 06 83 ff a1 75 0a c3 83 ff f4

The WDM_READ flag is normally cleared by wdm_int_callback
before resubmitting the read urb, and set by wdm_in_callback
when this urb returns with data or an error. But a crashing
device may cause both a read error and cancelling all urbs.
Make sure that the flag is cleared by wdm_read if the buffer
is empty.

We don't clear the flag on errors, as there may be pending
data in the buffer which should be processed. The flag will
instead be cleared on the next wdm_read call.

Signed-off-by: Bjørn Mork <[email protected]>
Acked-by: Oliver Neukum <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/class/cdc-wdm.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index ce1af28e54ff..85e20efba9ca 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -466,6 +466,8 @@ retry:
}

if (!desc->reslength) { /* zero length read */
+ dev_dbg(&desc->intf->dev, "%s: zero length - clearing WDM_READ\n", __func__);
+ clear_bit(WDM_READ, &desc->flags);
spin_unlock_irq(&desc->iuspin);
goto retry;
}
--
1.8.5.2

2014-02-05 20:21:58

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 174/213] ext4: fix memory leak in ext4_xattr_set_acl()'s error path

From: Eugene Shatokhin <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 24ec19b0ae83a385ad9c55520716da671274b96c upstream.

In ext4_xattr_set_acl(), if ext4_journal_start() returns an error,
posix_acl_release() will not be called for 'acl' which may result in a
memory leak.

This patch fixes that.

Reviewed-by: Lukas Czerner <[email protected]>
Signed-off-by: Eugene Shatokhin <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/acl.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index 8a2a29d35a6f..f14fa786dad0 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -442,8 +442,10 @@ ext4_xattr_set_acl(struct dentry *dentry, const char *name, const void *value,

retry:
handle = ext4_journal_start(inode, EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
- if (IS_ERR(handle))
- return PTR_ERR(handle);
+ if (IS_ERR(handle)) {
+ error = PTR_ERR(handle);
+ goto release_and_out;
+ }
error = ext4_set_acl(handle, inode, type, acl);
ext4_journal_stop(handle);
if (error == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
--
1.8.5.2

2014-02-05 20:21:53

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 177/213] ext4: fix fdatasync() for files with only i_size changes

From: Jan Kara <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream.

Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.

Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.

Reported-by: Kristian Nielsen <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/inode.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 893da43223d4..658ca8d92ded 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5236,6 +5236,7 @@ static int ext4_do_update_inode(handle_t *handle,
struct ext4_inode_info *ei = EXT4_I(inode);
struct buffer_head *bh = iloc->bh;
int err = 0, rc, block;
+ int need_datasync = 0;

/* For fields not not tracking in the in-memory inode,
* initialise them to zero for new inodes. */
@@ -5284,7 +5285,10 @@ static int ext4_do_update_inode(handle_t *handle,
raw_inode->i_file_acl_high =
cpu_to_le16(ei->i_file_acl >> 32);
raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
- ext4_isize_set(raw_inode, ei->i_disksize);
+ if (ei->i_disksize != ext4_isize(raw_inode)) {
+ ext4_isize_set(raw_inode, ei->i_disksize);
+ need_datasync = 1;
+ }
if (ei->i_disksize > 0x7fffffffULL) {
struct super_block *sb = inode->i_sb;
if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
@@ -5337,7 +5341,7 @@ static int ext4_do_update_inode(handle_t *handle,
err = rc;
ext4_clear_inode_state(inode, EXT4_STATE_NEW);

- ext4_update_inode_fsync_trans(handle, inode, 0);
+ ext4_update_inode_fsync_trans(handle, inode, need_datasync);
out_brelse:
brelse(bh);
ext4_std_error(inode->i_sb, err);
--
1.8.5.2

2014-02-05 20:22:28

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 175/213] ext4: online defrag is not supported for journaled files

From: Dmitry Monakhov <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream.

Proper block swap for inodes with full journaling enabled is
truly non obvious task. In order to be on a safe side let's
explicitly disable it for now.

Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/move_extent.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index deff4a5085e8..6764168776bf 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -1209,7 +1209,12 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
orig_inode->i_ino, donor_inode->i_ino);
return -EINVAL;
}
-
+ /* TODO: This is non obvious task to swap blocks for inodes with full
+ jornaling enabled */
+ if (ext4_should_journal_data(orig_inode) ||
+ ext4_should_journal_data(donor_inode)) {
+ return -EINVAL;
+ }
/* Protect orig and donor inodes against a truncate */
ret1 = mext_inode_double_lock(orig_inode, donor_inode);
if (ret1 < 0)
--
1.8.5.2

2014-02-05 20:07:35

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 155/213] USB: serial: fix race between probe and open

From: Johan Hovold <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit a65a6f14dc24a90bde3f5d0073ba2364476200bf upstream.

Fix race between probe and open by making sure that the disconnected
flag is not cleared until all ports have been registered.

A call to tty_open while probe is running may get a reference to the
serial structure in serial_install before its ports have been
registered. This may lead to usb_serial_core calling driver open before
port is fully initialised.

With ftdi_sio this result in the following NULL-pointer dereference as
the private data has not been initialised at open:

[ 199.698286] IP: [<f811a089>] ftdi_open+0x59/0xe0 [ftdi_sio]
[ 199.698297] *pde = 00000000
[ 199.698303] Oops: 0000 [#1] PREEMPT SMP
[ 199.698313] Modules linked in: ftdi_sio usbserial
[ 199.698323]
[ 199.698327] Pid: 1146, comm: ftdi_open Not tainted 3.2.11 #70 Dell Inc. Vostro 1520/0T816J
[ 199.698339] EIP: 0060:[<f811a089>] EFLAGS: 00010286 CPU: 0
[ 199.698344] EIP is at ftdi_open+0x59/0xe0 [ftdi_sio]
[ 199.698348] EAX: 0000003e EBX: f5067000 ECX: 00000000 EDX: 80000600
[ 199.698352] ESI: f48d8800 EDI: 00000001 EBP: f515dd54 ESP: f515dcfc
[ 199.698356] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 199.698361] Process ftdi_open (pid: 1146, ti=f515c000 task=f481e040 task.ti=f515c000)
[ 199.698364] Stack:
[ 199.698368] f811a9fe f811a9e0 f811b3ef 00000000 00000000 00001388 00000000 f4a86800
[ 199.698387] 00000002 00000000 f806e68e 00000000 f532765c f481e040 00000246 22222222
[ 199.698479] 22222222 22222222 22222222 f5067004 f5327600 f5327638 f515dd74 f806e6ab
[ 199.698496] Call Trace:
[ 199.698504] [<f806e68e>] ? serial_activate+0x2e/0x70 [usbserial]
[ 199.698511] [<f806e6ab>] serial_activate+0x4b/0x70 [usbserial]
[ 199.698521] [<c126380c>] tty_port_open+0x7c/0xd0
[ 199.698527] [<f806e660>] ? serial_set_termios+0xa0/0xa0 [usbserial]
[ 199.698534] [<f806e76f>] serial_open+0x2f/0x70 [usbserial]
[ 199.698540] [<c125d07c>] tty_open+0x20c/0x510
[ 199.698546] [<c10e9eb7>] chrdev_open+0xe7/0x230
[ 199.698553] [<c10e48f2>] __dentry_open+0x1f2/0x390
[ 199.698559] [<c144bfec>] ? _raw_spin_unlock+0x2c/0x50
[ 199.698565] [<c10e4b76>] nameidata_to_filp+0x66/0x80
[ 199.698570] [<c10e9dd0>] ? cdev_put+0x20/0x20
[ 199.698576] [<c10f3e08>] do_last+0x198/0x730
[ 199.698581] [<c10f4440>] path_openat+0xa0/0x350
[ 199.698587] [<c10f47d5>] do_filp_open+0x35/0x80
[ 199.698593] [<c144bfec>] ? _raw_spin_unlock+0x2c/0x50
[ 199.698599] [<c10ff110>] ? alloc_fd+0xc0/0x100
[ 199.698605] [<c10f0b72>] ? getname_flags+0x72/0x120
[ 199.698611] [<c10e4450>] do_sys_open+0xf0/0x1c0
[ 199.698617] [<c11fcc08>] ? trace_hardirqs_on_thunk+0xc/0x10
[ 199.698623] [<c10e458e>] sys_open+0x2e/0x40
[ 199.698628] [<c144c990>] sysenter_do_call+0x12/0x36
[ 199.698632] Code: 85 89 00 00 00 8b 16 8b 4d c0 c1 e2 08 c7 44 24 14 88 13 00 00 81 ca 00 00 00 80 c7 44 24 10 00 00 00 00 c7 44 24 0c 00 00 00 00 <0f> b7 41 78 31 c9 89 44 24 08 c7 44 24 04 00 00 00 00 c7 04 24
[ 199.698884] EIP: [<f811a089>] ftdi_open+0x59/0xe0 [ftdi_sio] SS:ESP 0068:f515dcfc
[ 199.698893] CR2: 0000000000000078
[ 199.698925] ---[ end trace 77c43ec023940cff ]---

Reported-and-tested-by: Ken Huang <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/serial/usb-serial.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/usb/serial/usb-serial.c b/drivers/usb/serial/usb-serial.c
index b40884a4191d..561bf115619b 100644
--- a/drivers/usb/serial/usb-serial.c
+++ b/drivers/usb/serial/usb-serial.c
@@ -1040,6 +1040,12 @@ int usb_serial_probe(struct usb_interface *interface,
serial->attached = 1;
}

+ /* Avoid race with tty_open and serial_install by setting the
+ * disconnected flag and not clearing it until all ports have been
+ * registered.
+ */
+ serial->disconnected = 1;
+
if (get_free_serial(serial, num_ports, &minor) == NULL) {
dev_err(&interface->dev, "No more free serial devices\n");
goto probe_error;
@@ -1062,6 +1068,8 @@ int usb_serial_probe(struct usb_interface *interface,
}
}

+ serial->disconnected = 0;
+
usb_serial_console_init(debug, minor);

exit:
--
1.8.5.2

2014-02-05 20:22:45

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 173/213] ext4: don't dereference null pointer when make_indexed_dir() fails

From: Allison Henderson <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 6976a6f2acde2b0443cd64f1d08af90630e4ce81 upstream.

Fix for a null pointer bug found while running punch hole tests

Signed-off-by: Allison Henderson <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/namei.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index caa3c77f1743..41198b355a26 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1451,6 +1451,10 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
frame->at = entries;
frame->bh = bh;
bh = bh2;
+
+ ext4_handle_dirty_metadata(handle, dir, frame->bh);
+ ext4_handle_dirty_metadata(handle, dir, bh);
+
de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
if (!de) {
/*
@@ -1459,8 +1463,6 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
* with corrupted filesystem.
*/
ext4_mark_inode_dirty(handle, dir);
- ext4_handle_dirty_metadata(handle, dir, frame->bh);
- ext4_handle_dirty_metadata(handle, dir, bh);
dx_release(frames);
return retval;
}
--
1.8.5.2

2014-02-05 20:23:01

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 172/213] ext4: Fix fs corruption when make_indexed_dir() fails

From: Jan Kara <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 7ad8e4e6ae2a7c95445ee1715b1714106fb95037 upstream.

When make_indexed_dir() fails (e.g. because of ENOSPC) after it has
allocated block for index tree root, we did not properly mark all
changed buffers dirty. This lead to only some of these buffers being
written out and thus effectively corrupting the directory.

Fix the issue by marking all changed data dirty even in the error
failure case.

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ext4/namei.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 2f31631935ba..caa3c77f1743 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1452,9 +1452,19 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
frame->bh = bh;
bh = bh2;
de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
- dx_release (frames);
- if (!(de))
+ if (!de) {
+ /*
+ * Even if the block split failed, we have to properly write
+ * out all the changes we did so far. Otherwise we can end up
+ * with corrupted filesystem.
+ */
+ ext4_mark_inode_dirty(handle, dir);
+ ext4_handle_dirty_metadata(handle, dir, frame->bh);
+ ext4_handle_dirty_metadata(handle, dir, bh);
+ dx_release(frames);
return retval;
+ }
+ dx_release(frames);

retval = add_dirent_to_buf(handle, dentry, inode, de, bh);
brelse(bh);
--
1.8.5.2

2014-02-05 20:23:24

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 167/213] ecryptfs: call vfs_setxattr() in ecryptfs_setxattr()

From: Roberto Sassu <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 48b512e6857139393cdfce26348c362b87537018 upstream.

Ecryptfs is a stackable filesystem which relies on lower filesystems the
ability of setting/getting extended attributes.

If there is a security module enabled on the system it updates the
'security' field of inodes according to the owned extended attribute set
with the function vfs_setxattr(). When this function is performed on a
ecryptfs filesystem the 'security' field is not updated for the lower
filesystem since the call security_inode_post_setxattr() is missing for
the lower inode.
Further, the call security_inode_setxattr() is missing for the lower inode,
leading to policy violations in the security module because specific
checks for this hook are not performed (i. e. filesystem
'associate' permission on SELinux is not checked for the lower filesystem).

This patch replaces the call of the setxattr() method of the lower inode
in the function ecryptfs_setxattr() with vfs_setxattr().

Signed-off-by: Roberto Sassu <[email protected]>
Cc: Dustin Kirkland <[email protected]>
Acked-by: James Morris <[email protected]>
Signed-off-by: Tyler Hicks <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ecryptfs/inode.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index b39e46f020a7..168706e90e3f 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -32,6 +32,7 @@
#include <linux/crypto.h>
#include <linux/fs_stack.h>
#include <linux/slab.h>
+#include <linux/xattr.h>
#include <asm/unaligned.h>
#include "ecryptfs_kernel.h"

@@ -1054,10 +1055,8 @@ ecryptfs_setxattr(struct dentry *dentry, const char *name, const void *value,
rc = -EOPNOTSUPP;
goto out;
}
- mutex_lock(&lower_dentry->d_inode->i_mutex);
- rc = lower_dentry->d_inode->i_op->setxattr(lower_dentry, name, value,
- size, flags);
- mutex_unlock(&lower_dentry->d_inode->i_mutex);
+
+ rc = vfs_setxattr(lower_dentry, name, value, size, flags);
out:
return rc;
}
--
1.8.5.2

2014-02-05 20:23:54

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 166/213] jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

From: Eric Sandeen <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 15291164b22a357cb211b618adfef4fa82fc0de3 upstream.

journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head
state ala discard_buffer(), but does not touch _Delay or _Unwritten as
discard_buffer() does.

This can be problematic in some areas of the ext4 code which assume
that if they have found a buffer marked unwritten or delay, then it's
a live one. Perhaps those spots should check whether it is mapped
as well, but if jbd2 is going to tear down a buffer, let's really
tear it down completely.

Without this I get some fsx failures on sub-page-block filesystems
up until v3.2, at which point 4e96b2dbbf1d7e81f22047a50f862555a6cb87cb
and 189e868fa8fdca702eb9db9d8afc46b5cb9144c9 make the failures go
away, because buried within that large change is some more flag
clearing. I still think it's worth doing in jbd2, since
->invalidatepage leads here directly, and it's the right place
to clear away these flags.

Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd2/transaction.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index bfc70f57900f..ed89123a240f 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1836,6 +1836,8 @@ zap_buffer_unlocked:
clear_buffer_mapped(bh);
clear_buffer_req(bh);
clear_buffer_new(bh);
+ clear_buffer_delay(bh);
+ clear_buffer_unwritten(bh);
bh->b_bdev = NULL;
return may_free;
}
--
1.8.5.2

2014-02-05 20:23:51

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 168/213] eCryptfs: Copy up lower inode attrs after setting lower xattr

From: Tyler Hicks <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 545d680938be1e86a6c5250701ce9abaf360c495 upstream.

After passing through a ->setxattr() call, eCryptfs needs to copy the
inode attributes from the lower inode to the eCryptfs inode, as they
may have changed in the lower filesystem's ->setxattr() path.

One example is if an extended attribute containing a POSIX Access
Control List is being set. The new ACL may cause the lower filesystem to
modify the mode of the lower inode and the eCryptfs inode would need to
be updated to reflect the new mode.

https://launchpad.net/bugs/926292

Signed-off-by: Tyler Hicks <[email protected]>
Reported-by: Sebastien Bacher <[email protected]>
Cc: John Johansen <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/ecryptfs/inode.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 168706e90e3f..532b97bb1f15 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -1057,6 +1057,8 @@ ecryptfs_setxattr(struct dentry *dentry, const char *name, const void *value,
}

rc = vfs_setxattr(lower_dentry, name, value, size, flags);
+ if (!rc)
+ fsstack_copy_attr_all(dentry->d_inode, lower_dentry->d_inode);
out:
return rc;
}
--
1.8.5.2

2014-02-05 20:24:59

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 160/213] sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat()

From: Geert Uytterhoeven <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 66081a72517a131430dcf986775f3268aafcb546 upstream.

The warning check for duplicate sysfs entries can cause a buffer overflow
when printing the warning, as strcat() doesn't check buffer sizes.
Use strlcat() instead.

Since strlcat() doesn't return a pointer to the passed buffer, unlike
strcat(), I had to convert the nested concatenation in sysfs_add_one() to
an admittedly more obscure comma operator construct, to avoid emitting code
for the concatenation if CONFIG_BUG is disabled.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/sysfs/dir.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 590717861c7a..37d7153d3f72 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -400,20 +400,18 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
/**
* sysfs_pathname - return full path to sysfs dirent
* @sd: sysfs_dirent whose path we want
- * @path: caller allocated buffer
+ * @path: caller allocated buffer of size PATH_MAX
*
* Gives the name "/" to the sysfs_root entry; any path returned
* is relative to wherever sysfs is mounted.
- *
- * XXX: does no error checking on @path size
*/
static char *sysfs_pathname(struct sysfs_dirent *sd, char *path)
{
if (sd->s_parent) {
sysfs_pathname(sd->s_parent, path);
- strcat(path, "/");
+ strlcat(path, "/", PATH_MAX);
}
- strcat(path, sd->s_name);
+ strlcat(path, sd->s_name, PATH_MAX);
return path;
}

@@ -446,9 +444,11 @@ int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
char *path = kzalloc(PATH_MAX, GFP_KERNEL);
WARN(1, KERN_WARNING
"sysfs: cannot create duplicate filename '%s'\n",
- (path == NULL) ? sd->s_name :
- strcat(strcat(sysfs_pathname(acxt->parent_sd, path), "/"),
- sd->s_name));
+ (path == NULL) ? sd->s_name
+ : (sysfs_pathname(acxt->parent_sd, path),
+ strlcat(path, "/", PATH_MAX),
+ strlcat(path, sd->s_name, PATH_MAX),
+ path));
kfree(path);
}

--
1.8.5.2

2014-02-05 20:25:01

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 161/213] tmpfs: fix use-after-free of mempolicy object

From: Greg Thelen <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 5f00110f7273f9ff04ac69a5f85bb535a4fd0987 upstream.

The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request. A new policy can be
specified if mpol=M is given.

Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.

To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
# mkdir /tmp/x

# mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x

# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0

# mount -o remount,size=200M nodev /tmp/x

# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
# note ? garbage in mpol=... output above

# dd if=/dev/zero of=/tmp/x/f count=1
# panic here

Panic:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
[...]
Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
Call Trace:
mpol_shared_policy_init+0xa5/0x160
shmem_get_inode+0x209/0x270
shmem_mknod+0x3e/0xf0
shmem_create+0x18/0x20
vfs_create+0xb5/0x130
do_last+0x9a1/0xea0
path_openat+0xb3/0x4d0
do_filp_open+0x42/0xa0
do_sys_open+0xfe/0x1e0
compat_sys_open+0x1b/0x20
cstar_dispatch+0x7/0x1f

Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault. Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.

The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:

config = *sbinfo
shmem_parse_options(data, &config, true)
mpol_put(sbinfo->mpol)
sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */

This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.

How far back does this issue go? I see it in both 2.6.36 and 3.3. I did
not look back further.

Signed-off-by: Greg Thelen <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
mm/shmem.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 0203cda3297a..f24ce93efc15 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2254,6 +2254,7 @@ static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
unsigned long inodes;
int error = -EINVAL;

+ config.mpol = NULL;
if (shmem_parse_options(data, &config, true))
return error;

@@ -2281,8 +2282,13 @@ static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
sbinfo->max_inodes = config.max_inodes;
sbinfo->free_inodes = config.max_inodes - inodes;

- mpol_put(sbinfo->mpol);
- sbinfo->mpol = config.mpol; /* transfers initial ref */
+ /*
+ * Preserve previous mempolicy unless mpol remount option was specified.
+ */
+ if (config.mpol) {
+ mpol_put(sbinfo->mpol);
+ sbinfo->mpol = config.mpol; /* transfers initial ref */
+ }
out:
spin_unlock(&sbinfo->stat_lock);
return error;
--
1.8.5.2

2014-02-05 20:24:57

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 164/213] jbd: Fix assertion failure in commit code due to lacking transaction credits

From: Jan Kara <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 09e05d4805e6c524c1af74e524e5d0528bb3fef3 upstream.

ext3 users of data=journal mode with blocksize < pagesize were occasionally
hitting assertion failure in journal_commit_transaction() checking whether the
transaction has at least as many credits reserved as buffers attached. The
core of the problem is that when a file gets truncated, buffers that still need
checkpointing or that are attached to the committing transaction are left with
buffer_mapped set. When this happens to buffers beyond i_size attached to a
page stradding i_size, subsequent write extending the file will see these
buffers and as they are mapped (but underlying blocks were freed) things go
awry from here.

The assertion failure just coincidentally (and in this case luckily as we would
start corrupting filesystem) triggers due to journal_head not being properly
cleaned up as well.

Under some rare circumstances this bug could even hit data=ordered mode users.
There the assertion won't trigger and we would end up corrupting the
filesystem.

We fix the problem by unmapping buffers if possible (in lots of cases we just
need a buffer attached to a transaction as a place holder but it must not be
written out anyway). And in one case, we just have to bite the bullet and wait
for transaction commit to finish.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd/commit.c | 45 +++++++++++++++++++++++++++---------
fs/jbd/transaction.c | 64 ++++++++++++++++++++++++++++++++++++----------------
2 files changed, 78 insertions(+), 31 deletions(-)

diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index 1df9270c900b..3fb1656917aa 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -84,7 +84,12 @@ nope:
static void release_data_buffer(struct buffer_head *bh)
{
if (buffer_freed(bh)) {
+ WARN_ON_ONCE(buffer_dirty(bh));
clear_buffer_freed(bh);
+ clear_buffer_mapped(bh);
+ clear_buffer_new(bh);
+ clear_buffer_req(bh);
+ bh->b_bdev = NULL;
release_buffer_page(bh);
} else
put_bh(bh);
@@ -863,17 +868,35 @@ restart_loop:
* there's no point in keeping a checkpoint record for
* it. */

- /* A buffer which has been freed while still being
- * journaled by a previous transaction may end up still
- * being dirty here, but we want to avoid writing back
- * that buffer in the future after the "add to orphan"
- * operation been committed, That's not only a performance
- * gain, it also stops aliasing problems if the buffer is
- * left behind for writeback and gets reallocated for another
- * use in a different page. */
- if (buffer_freed(bh) && !jh->b_next_transaction) {
- clear_buffer_freed(bh);
- clear_buffer_jbddirty(bh);
+ /*
+ * A buffer which has been freed while still being journaled by
+ * a previous transaction.
+ */
+ if (buffer_freed(bh)) {
+ /*
+ * If the running transaction is the one containing
+ * "add to orphan" operation (b_next_transaction !=
+ * NULL), we have to wait for that transaction to
+ * commit before we can really get rid of the buffer.
+ * So just clear b_modified to not confuse transaction
+ * credit accounting and refile the buffer to
+ * BJ_Forget of the running transaction. If the just
+ * committed transaction contains "add to orphan"
+ * operation, we can completely invalidate the buffer
+ * now. We are rather throughout in that since the
+ * buffer may be still accessible when blocksize <
+ * pagesize and it is attached to the last partial
+ * page.
+ */
+ jh->b_modified = 0;
+ if (!jh->b_next_transaction) {
+ clear_buffer_freed(bh);
+ clear_buffer_jbddirty(bh);
+ clear_buffer_mapped(bh);
+ clear_buffer_new(bh);
+ clear_buffer_req(bh);
+ bh->b_bdev = NULL;
+ }
}

if (buffer_jbddirty(bh)) {
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 5ae71e75a491..bc8ab97dcd90 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1838,15 +1838,16 @@ static int __dispose_buffer(struct journal_head *jh, transaction_t *transaction)
* We're outside-transaction here. Either or both of j_running_transaction
* and j_committing_transaction may be NULL.
*/
-static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
+static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
+ int partial_page)
{
transaction_t *transaction;
struct journal_head *jh;
int may_free = 1;
- int ret;

BUFFER_TRACE(bh, "entry");

+retry:
/*
* It is safe to proceed here without the j_list_lock because the
* buffers cannot be stolen by try_to_free_buffers as long as we are
@@ -1874,10 +1875,18 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
* clear the buffer dirty bit at latest at the moment when the
* transaction marking the buffer as freed in the filesystem
* structures is committed because from that moment on the
- * buffer can be reallocated and used by a different page.
+ * block can be reallocated and used by a different page.
* Since the block hasn't been freed yet but the inode has
* already been added to orphan list, it is safe for us to add
* the buffer to BJ_Forget list of the newest transaction.
+ *
+ * Also we have to clear buffer_mapped flag of a truncated buffer
+ * because the buffer_head may be attached to the page straddling
+ * i_size (can happen only when blocksize < pagesize) and thus the
+ * buffer_head can be reused when the file is extended again. So we end
+ * up keeping around invalidated buffers attached to transactions'
+ * BJ_Forget list just to stop checkpointing code from cleaning up
+ * the transaction this buffer was modified in.
*/
transaction = jh->b_transaction;
if (transaction == NULL) {
@@ -1904,13 +1913,9 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
* committed, the buffer won't be needed any
* longer. */
JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget");
- ret = __dispose_buffer(jh,
+ may_free = __dispose_buffer(jh,
journal->j_running_transaction);
- journal_put_journal_head(jh);
- spin_unlock(&journal->j_list_lock);
- jbd_unlock_bh_state(bh);
- spin_unlock(&journal->j_state_lock);
- return ret;
+ goto zap_buffer;
} else {
/* There is no currently-running transaction. So the
* orphan record which we wrote for this file must have
@@ -1918,13 +1923,9 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
* the committing transaction, if it exists. */
if (journal->j_committing_transaction) {
JBUFFER_TRACE(jh, "give to committing trans");
- ret = __dispose_buffer(jh,
+ may_free = __dispose_buffer(jh,
journal->j_committing_transaction);
- journal_put_journal_head(jh);
- spin_unlock(&journal->j_list_lock);
- jbd_unlock_bh_state(bh);
- spin_unlock(&journal->j_state_lock);
- return ret;
+ goto zap_buffer;
} else {
/* The orphan record's transaction has
* committed. We can cleanse this buffer */
@@ -1945,10 +1946,24 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
}
/*
* The buffer is committing, we simply cannot touch
- * it. So we just set j_next_transaction to the
- * running transaction (if there is one) and mark
- * buffer as freed so that commit code knows it should
- * clear dirty bits when it is done with the buffer.
+ * it. If the page is straddling i_size we have to wait
+ * for commit and try again.
+ */
+ if (partial_page) {
+ tid_t tid = journal->j_committing_transaction->t_tid;
+
+ journal_put_journal_head(jh);
+ spin_unlock(&journal->j_list_lock);
+ jbd_unlock_bh_state(bh);
+ spin_unlock(&journal->j_state_lock);
+ log_wait_commit(journal, tid);
+ goto retry;
+ }
+ /*
+ * OK, buffer won't be reachable after truncate. We just set
+ * j_next_transaction to the running transaction (if there is
+ * one) and mark buffer as freed so that commit code knows it
+ * should clear dirty bits when it is done with the buffer.
*/
set_buffer_freed(bh);
if (journal->j_running_transaction && buffer_jbddirty(bh))
@@ -1971,6 +1986,14 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
}

zap_buffer:
+ /*
+ * This is tricky. Although the buffer is truncated, it may be reused
+ * if blocksize < pagesize and it is attached to the page straddling
+ * EOF. Since the buffer might have been added to BJ_Forget list of the
+ * running transaction, journal_get_write_access() won't clear
+ * b_modified and credit accounting gets confused. So clear b_modified
+ * here. */
+ jh->b_modified = 0;
journal_put_journal_head(jh);
zap_buffer_no_jh:
spin_unlock(&journal->j_list_lock);
@@ -2019,7 +2042,8 @@ void journal_invalidatepage(journal_t *journal,
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
- may_free &= journal_unmap_buffer(journal, bh);
+ may_free &= journal_unmap_buffer(journal, bh,
+ offset > 0);
unlock_buffer(bh);
}
curr_off = next_off;
--
1.8.5.2

2014-02-05 20:25:59

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 162/213] fs/cifs/cifs_dfs_ref.c: fix potential memory leakage

From: Cong Ding <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 10b8c7dff5d3633b69e77f57d404dab54ead3787 upstream.

When it goes to error through line 144, the memory allocated to *devname is
not freed, and the caller doesn't free it either in line 250. So we free the
memroy of *devname in function cifs_compose_mount_options() when it goes to
error.

Signed-off-by: Cong Ding <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/cifs/cifs_dfs_ref.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/cifs/cifs_dfs_ref.c b/fs/cifs/cifs_dfs_ref.c
index 78e4d2a3a68b..61338373315e 100644
--- a/fs/cifs/cifs_dfs_ref.c
+++ b/fs/cifs/cifs_dfs_ref.c
@@ -227,6 +227,8 @@ compose_mount_options_out:
compose_mount_options_err:
kfree(mountdata);
mountdata = ERR_PTR(rc);
+ kfree(*devname);
+ *devname = NULL;
goto compose_mount_options_out;
}

--
1.8.5.2

2014-02-05 20:25:58

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 158/213] fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check

From: Kees Cook <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 12176503366885edd542389eed3aaf94be163fdb upstream.

The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check
while converting ioctl arguments. This could lead to leaking kernel
stack contents into userspace.

Patch extracted from existing fix in grsecurity.

Signed-off-by: Kees Cook <[email protected]>
Cc: David Miller <[email protected]>
Cc: Brad Spengler <[email protected]>
Cc: PaX Team <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/compat_ioctl.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index 641640dc7ae5..7a00d9b155bf 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -227,6 +227,8 @@ static int do_video_set_spu_palette(unsigned int fd, unsigned int cmd,

err = get_user(palp, &up->palette);
err |= get_user(length, &up->length);
+ if (err)
+ return -EFAULT;

up_native = compat_alloc_user_space(sizeof(struct video_spu_palette));
err = put_user(compat_ptr(palp), &up_native->palette);
--
1.8.5.2

2014-02-05 20:26:43

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 159/213] fs/fscache/stats.c: fix memory leak

From: Anurup m <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit ec686c9239b4d472052a271c505d04dae84214cc upstream.

There is a kernel memory leak observed when the proc file
/proc/fs/fscache/stats is read.

The reason is that in fscache_stats_open, single_open is called and the
respective release function is not called during release. Hence fix
with correct release function - single_release().

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101

Signed-off-by: Anurup m <[email protected]>
Cc: shyju pv <[email protected]>
Cc: Sanil kumar <[email protected]>
Cc: Nataraj m <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/fscache/stats.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 4765190d537f..73c0bd7f7424 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -276,5 +276,5 @@ const struct file_operations fscache_stats_fops = {
.open = fscache_stats_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release,
+ .release = single_release,
};
--
1.8.5.2

2014-02-05 20:07:11

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 146/213] USB: garmin_gps: fix memory leak on disconnect

From: Johan Hovold <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 618aa1068df29c37a58045fe940f9106664153fd upstream.

Remove bogus disconnect test introduced by 95bef012e ("USB: more serial
drivers writing after disconnect") which prevented queued data from
being freed on disconnect.

The possible IO it was supposed to prevent is long gone.

Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/serial/garmin_gps.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/usb/serial/garmin_gps.c b/drivers/usb/serial/garmin_gps.c
index 0f0a122c6525..0e5838138f4b 100644
--- a/drivers/usb/serial/garmin_gps.c
+++ b/drivers/usb/serial/garmin_gps.c
@@ -973,10 +973,7 @@ static void garmin_close(struct usb_serial_port *port)
if (!serial)
return;

- mutex_lock(&port->serial->disc_mutex);
-
- if (!port->serial->disconnected)
- garmin_clear(garmin_data_p);
+ garmin_clear(garmin_data_p);

/* shutdown our urbs */
usb_kill_urb(port->read_urb);
@@ -985,8 +982,6 @@ static void garmin_close(struct usb_serial_port *port)
/* keep reset state so we know that we must start a new session */
if (garmin_data_p->state != STATE_RESET)
garmin_data_p->state = STATE_DISCONNECTED;
-
- mutex_unlock(&port->serial->disc_mutex);
}


--
1.8.5.2

2014-02-05 20:27:19

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 148/213] USB: cdc-wdm: fix buffer overflow

From: Oliver Neukum <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c0f5ecee4e741667b2493c742b60b6218d40b3aa upstream.

The buffer for responses must not overflow.
If this would happen, set a flag, drop the data and return
an error after user space has read all remaining data.

Signed-off-by: Oliver Neukum <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[PG: minor adjustment since RESET from 880442027569 isn't in .34]
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/class/cdc-wdm.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index 189141ca4e05..ce1af28e54ff 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -54,6 +54,7 @@ MODULE_DEVICE_TABLE (usb, wdm_ids);
#define WDM_POLL_RUNNING 6
#define WDM_RESPONDING 7
#define WDM_SUSPENDING 8
+#define WDM_OVERFLOW 10

#define WDM_MAX 16

@@ -114,6 +115,7 @@ static void wdm_in_callback(struct urb *urb)
{
struct wdm_device *desc = urb->context;
int status = urb->status;
+ int length = urb->actual_length;

spin_lock(&desc->iuspin);
clear_bit(WDM_RESPONDING, &desc->flags);
@@ -144,9 +146,17 @@ static void wdm_in_callback(struct urb *urb)
}

desc->rerr = status;
- desc->reslength = urb->actual_length;
- memmove(desc->ubuf + desc->length, desc->inbuf, desc->reslength);
- desc->length += desc->reslength;
+ if (length + desc->length > desc->wMaxCommand) {
+ /* The buffer would overflow */
+ set_bit(WDM_OVERFLOW, &desc->flags);
+ } else {
+ /* we may already be in overflow */
+ if (!test_bit(WDM_OVERFLOW, &desc->flags)) {
+ memmove(desc->ubuf + desc->length, desc->inbuf, length);
+ desc->length += length;
+ desc->reslength = length;
+ }
+ }
skip_error:
wake_up(&desc->wait);

@@ -410,6 +420,11 @@ retry:
rv = -ENODEV;
goto err;
}
+ if (test_bit(WDM_OVERFLOW, &desc->flags)) {
+ clear_bit(WDM_OVERFLOW, &desc->flags);
+ rv = -ENOBUFS;
+ goto err;
+ }
i++;
if (file->f_flags & O_NONBLOCK) {
if (!test_bit(WDM_READ, &desc->flags)) {
@@ -449,6 +464,7 @@ retry:
spin_unlock_irq(&desc->iuspin);
goto retry;
}
+
if (!desc->reslength) { /* zero length read */
spin_unlock_irq(&desc->iuspin);
goto retry;
@@ -860,6 +876,7 @@ static int wdm_post_reset(struct usb_interface *intf)
struct wdm_device *desc = usb_get_intfdata(intf);
int rv;

+ clear_bit(WDM_OVERFLOW, &desc->flags);
rv = recover_from_urb_loss(desc);
mutex_unlock(&desc->lock);
return 0;
--
1.8.5.2

2014-02-05 20:27:39

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 152/213] USB: echi-dbgp: increase the controller wait time to come out of halt.

From: Colin Ian King <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit f96a4216e85050c0a9d41a41ecb0ae9d8e39b509 upstream.

The default 10 microsecond delay for the controller to come out of
halt in dbgp_ehci_startup is too short, so increase it to 1 millisecond.

This is based on emperical testing on various USB debug ports on
modern machines such as a Lenovo X220i and an Ivybridge development
platform that needed to wait ~450-950 microseconds.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Jason Wessel <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/early/ehci-dbgp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/early/ehci-dbgp.c b/drivers/usb/early/ehci-dbgp.c
index 6e98a3697844..e4c7f53dc56b 100644
--- a/drivers/usb/early/ehci-dbgp.c
+++ b/drivers/usb/early/ehci-dbgp.c
@@ -437,7 +437,7 @@ static int dbgp_ehci_startup(void)
writel(FLAG_CF, &ehci_regs->configured_flag);

/* Wait until the controller is no longer halted */
- loop = 10;
+ loop = 1000;
do {
status = readl(&ehci_regs->status);
if (!(status & STS_HALT))
--
1.8.5.2

2014-02-05 20:27:38

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 151/213] usb: serial: mos7840: Fixup mos7840_chars_in_buffer()

From: Mark Ferrell <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 5c263b92f828af6a8cf54041db45ceae5af8f2ab upstream.

* Use the buffer content length as opposed to the total buffer size. This can
be a real problem when using the mos7840 as a usb serial-console as all
kernel output is truncated during boot.

Signed-off-by: Mark Ferrell <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/serial/mos7840.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index c55d2aa35eaa..a87c43012b5a 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -1180,9 +1180,12 @@ static int mos7840_chars_in_buffer(struct tty_struct *tty)
}

spin_lock_irqsave(&mos7840_port->pool_lock, flags);
- for (i = 0; i < NUM_URBS; ++i)
- if (mos7840_port->busy[i])
- chars += URB_TRANSFER_BUFFER_SIZE;
+ for (i = 0; i < NUM_URBS; ++i) {
+ if (mos7840_port->busy[i]) {
+ struct urb *urb = mos7840_port->write_urb_pool[i];
+ chars += urb->transfer_buffer_length;
+ }
+ }
spin_unlock_irqrestore(&mos7840_port->pool_lock, flags);
dbg("%s - returns %d", __func__, chars);
return chars;
--
1.8.5.2

2014-02-05 20:29:12

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 149/213] USB: serial: ftdi_sio: Handle the old_termios == 0 case e.g. uart_resume_port()

From: Andrew Worsley <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c515598e0f5769916c31c00392cc2bfe6af74e55 upstream.

Handle null old_termios in ftdi_set_termios() calls from uart_resume_port().

Signed-off-by: Andrew Worsley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/serial/ftdi_sio.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
index 646cc5326219..882af44bf8bb 100644
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -2336,6 +2336,9 @@ static void ftdi_set_termios(struct tty_struct *tty,

cflag = termios->c_cflag;

+ if (old_termios == 0)
+ goto no_skip;
+
if (old_termios->c_cflag == termios->c_cflag
&& old_termios->c_ispeed == termios->c_ispeed
&& old_termios->c_ospeed == termios->c_ospeed)
@@ -2349,6 +2352,7 @@ static void ftdi_set_termios(struct tty_struct *tty,
(termios->c_cflag & (CSIZE|PARODD|PARENB|CMSPAR|CSTOPB)))
goto no_data_parity_stop_changes;

+no_skip:
/* Set number of data bits, parity, stop bits */

urb_value = 0;
--
1.8.5.2

2014-02-05 20:29:10

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 147/213] USB: io_ti: Fix NULL dereference in chase_port()

From: Wolfgang Frisch <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 1ee0a224bc9aad1de496c795f96bc6ba2c394811 upstream.

The tty is NULL when the port is hanging up.
chase_port() needs to check for this.

This patch is intended for stable series.
The behavior was observed and tested in Linux 3.2 and 3.7.1.

Johan Hovold submitted a more elaborate patch for the mainline kernel.

[ 56.277883] usb 1-1: edge_bulk_in_callback - nonzero read bulk status received: -84
[ 56.278811] usb 1-1: USB disconnect, device number 3
[ 56.278856] usb 1-1: edge_bulk_in_callback - stopping read!
[ 56.279562] BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
[ 56.280536] IP: [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[ 56.281212] PGD 1dc1b067 PUD 1e0f7067 PMD 0
[ 56.282085] Oops: 0002 [#1] SMP
[ 56.282744] Modules linked in:
[ 56.283512] CPU 1
[ 56.283512] Pid: 25, comm: khubd Not tainted 3.7.1 #1 innotek GmbH VirtualBox/VirtualBox
[ 56.283512] RIP: 0010:[<ffffffff8144e62a>] [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[ 56.283512] RSP: 0018:ffff88001fa99ab0 EFLAGS: 00010046
[ 56.283512] RAX: 0000000000000046 RBX: 00000000000001c8 RCX: 0000000000640064
[ 56.283512] RDX: 0000000000010000 RSI: ffff88001fa99b20 RDI: 00000000000001c8
[ 56.283512] RBP: ffff88001fa99b20 R08: 0000000000000000 R09: 0000000000000000
[ 56.283512] R10: 0000000000000000 R11: ffffffff812fcb4c R12: ffff88001ddf53c0
[ 56.283512] R13: 0000000000000000 R14: 00000000000001c8 R15: ffff88001e19b9f4
[ 56.283512] FS: 0000000000000000(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
[ 56.283512] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 56.283512] CR2: 00000000000001c8 CR3: 000000001dc51000 CR4: 00000000000006e0
[ 56.283512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 56.283512] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 56.283512] Process khubd (pid: 25, threadinfo ffff88001fa98000, task ffff88001fa94f80)
[ 56.283512] Stack:
[ 56.283512] 0000000000000046 00000000000001c8 ffffffff810578ec ffffffff812fcb4c
[ 56.283512] ffff88001e19b980 0000000000002710 ffffffff812ffe81 0000000000000001
[ 56.283512] ffff88001fa94f80 0000000000000202 ffffffff00000001 0000000000000296
[ 56.283512] Call Trace:
[ 56.283512] [<ffffffff810578ec>] ? add_wait_queue+0x12/0x3c
[ 56.283512] [<ffffffff812fcb4c>] ? usb_serial_port_work+0x28/0x28
[ 56.283512] [<ffffffff812ffe81>] ? chase_port+0x84/0x2d6
[ 56.283512] [<ffffffff81063f27>] ? try_to_wake_up+0x199/0x199
[ 56.283512] [<ffffffff81263a5c>] ? tty_ldisc_hangup+0x222/0x298
[ 56.283512] [<ffffffff81300171>] ? edge_close+0x64/0x129
[ 56.283512] [<ffffffff810612f7>] ? __wake_up+0x35/0x46
[ 56.283512] [<ffffffff8106135b>] ? should_resched+0x5/0x23
[ 56.283512] [<ffffffff81264916>] ? tty_port_shutdown+0x39/0x44
[ 56.283512] [<ffffffff812fcb4c>] ? usb_serial_port_work+0x28/0x28
[ 56.283512] [<ffffffff8125d38c>] ? __tty_hangup+0x307/0x351
[ 56.283512] [<ffffffff812e6ddc>] ? usb_hcd_flush_endpoint+0xde/0xed
[ 56.283512] [<ffffffff8144e625>] ? _raw_spin_lock_irqsave+0x14/0x35
[ 56.283512] [<ffffffff812fd361>] ? usb_serial_disconnect+0x57/0xc2
[ 56.283512] [<ffffffff812ea99b>] ? usb_unbind_interface+0x5c/0x131
[ 56.283512] [<ffffffff8128d738>] ? __device_release_driver+0x7f/0xd5
[ 56.283512] [<ffffffff8128d9cd>] ? device_release_driver+0x1a/0x25
[ 56.283512] [<ffffffff8128d393>] ? bus_remove_device+0xd2/0xe7
[ 56.283512] [<ffffffff8128b7a3>] ? device_del+0x119/0x167
[ 56.283512] [<ffffffff812e8d9d>] ? usb_disable_device+0x6a/0x180
[ 56.283512] [<ffffffff812e2ae0>] ? usb_disconnect+0x81/0xe6
[ 56.283512] [<ffffffff812e4435>] ? hub_thread+0x577/0xe82
[ 56.283512] [<ffffffff8144daa7>] ? __schedule+0x490/0x4be
[ 56.283512] [<ffffffff8105798f>] ? abort_exclusive_wait+0x79/0x79
[ 56.283512] [<ffffffff812e3ebe>] ? usb_remote_wakeup+0x2f/0x2f
[ 56.283512] [<ffffffff812e3ebe>] ? usb_remote_wakeup+0x2f/0x2f
[ 56.283512] [<ffffffff810570b4>] ? kthread+0x81/0x89
[ 56.283512] [<ffffffff81057033>] ? __kthread_parkme+0x5c/0x5c
[ 56.283512] [<ffffffff8145387c>] ? ret_from_fork+0x7c/0xb0
[ 56.283512] [<ffffffff81057033>] ? __kthread_parkme+0x5c/0x5c
[ 56.283512] Code: 8b 7c 24 08 e8 17 0b c3 ff 48 8b 04 24 48 83 c4 10 c3 53 48 89 fb 41 50 e8 e0 0a c3 ff 48 89 04 24 e8 e7 0a c3 ff ba 00 00 01 00
<f0> 0f c1 13 48 8b 04 24 89 d1 c1 ea 10 66 39 d1 74 07 f3 90 66
[ 56.283512] RIP [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[ 56.283512] RSP <ffff88001fa99ab0>
[ 56.283512] CR2: 00000000000001c8
[ 56.283512] ---[ end trace 49714df27e1679ce ]---

Signed-off-by: Wolfgang Frisch <[email protected]>
Cc: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/serial/io_ti.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/usb/serial/io_ti.c b/drivers/usb/serial/io_ti.c
index b6e8908b5080..2e2bcf230be3 100644
--- a/drivers/usb/serial/io_ti.c
+++ b/drivers/usb/serial/io_ti.c
@@ -581,6 +581,9 @@ static void chase_port(struct edgeport_port *port, unsigned long timeout,
wait_queue_t wait;
unsigned long flags;

+ if (!tty)
+ return;
+
if (!timeout)
timeout = (HZ * EDGE_CLOSING_WAIT)/100;

--
1.8.5.2

2014-02-05 20:31:59

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 142/213] USB: whiteheat: fix memory leak in error path

From: Johan Hovold <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c129197c99550d356cf5f69b046994dd53cd1b9d upstream.

Make sure command buffer is deallocated in case of errors during attach.

Cc: <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/serial/whiteheat.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/whiteheat.c b/drivers/usb/serial/whiteheat.c
index 12ed8209ca72..9bd51e9dc30a 100644
--- a/drivers/usb/serial/whiteheat.c
+++ b/drivers/usb/serial/whiteheat.c
@@ -576,6 +576,7 @@ no_firmware:
"%s: please contact [email protected]\n",
serial->type->description);
kfree(result);
+ kfree(command);
return -ENODEV;

no_command_private:
--
1.8.5.2

2014-02-05 20:31:53

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 141/213] USB: EHCI: go back to using the system clock for QH unlinks

From: Alan Stern <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 004c19682884d4f40000ce1ded53f4a1d0b18206 upstream.

This patch (as1477) fixes a problem affecting a few types of EHCI
controller. Contrary to what one might expect, these controllers
automatically stop their internal frame counter when no ports are
enabled. Since ehci-hcd currently relies on the frame counter for
determining when it should unlink QHs from the async schedule, those
controllers run into trouble: The frame counter stops and the QHs
never get unlinked.

Some systems have also experienced other problems traced back to
commit b963801164618e25fbdc0cd452ce49c3628b46c8 (USB: ehci-hcd unlink
speedups), which made the original switch from using the system clock
to using the frame counter. It never became clear what the reason was
for these problems, but evidently it is related to use of the frame
counter.

To fix all these problems, this patch more or less reverts that commit
and goes back to using the system clock. But this can't be done
cleanly because other changes have since been made to the scan_async()
subroutine. One of these changes involved the tricky logic that tries
to avoid rescanning QHs that have already been seen when the scanning
loop is restarted, which happens whenever an URB is given back.
Switching back to clock-based unlinks would make this logic even more
complicated.

Therefore the new code doesn't rescan the entire async list whenever a
giveback occurs. Instead it rescans only the current QH and continues
on from there. This requires the use of a separate pointer to keep
track of the next QH to scan, since the current QH may be unlinked
while the scanning is in progress. That new pointer must be global,
so that it can be adjusted forward whenever the _next_ QH gets
unlinked. (uhci-hcd uses this same trick.)

Simplification of the scanning loop removes a level of indentation,
which accounts for the size of the patch. The amount of code changed
is relatively small, and it isn't exactly a reversion of the
b963801164 commit.

This fixes Bugzilla #32432.

Signed-off-by: Alan Stern <[email protected]>
Tested-by: Matej Kenda <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/host/ehci-hcd.c | 8 ++---
drivers/usb/host/ehci-q.c | 82 ++++++++++++++++++++++-----------------------
drivers/usb/host/ehci.h | 3 +-
3 files changed, 45 insertions(+), 48 deletions(-)

diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
index 5a320236ee3a..1674039cf683 100644
--- a/drivers/usb/host/ehci-hcd.c
+++ b/drivers/usb/host/ehci-hcd.c
@@ -84,7 +84,8 @@ static const char hcd_name [] = "ehci_hcd";
#define EHCI_IAA_MSECS 10 /* arbitrary */
#define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */
#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */
-#define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */
+#define EHCI_SHRINK_JIFFIES (DIV_ROUND_UP(HZ, 200) + 1)
+ /* 200-ms async qh unlink delay */

/* Initial IRQ latency: faster than hw default */
static int log2_irq_thresh = 0; // 0 to 6
@@ -139,10 +140,7 @@ timer_action(struct ehci_hcd *ehci, enum ehci_timer_action action)
break;
/* case TIMER_ASYNC_SHRINK: */
default:
- /* add a jiffie since we synch against the
- * 8 KHz uframe counter.
- */
- t = DIV_ROUND_UP(EHCI_SHRINK_FRAMES * HZ, 1000) + 1;
+ t = EHCI_SHRINK_JIFFIES;
break;
}
mod_timer(&ehci->watchdog, t + jiffies);
diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
index 9b46a1ee616f..1992abbe223d 100644
--- a/drivers/usb/host/ehci-q.c
+++ b/drivers/usb/host/ehci-q.c
@@ -1226,6 +1226,8 @@ static void start_unlink_async (struct ehci_hcd *ehci, struct ehci_qh *qh)

prev->hw->hw_next = qh->hw->hw_next;
prev->qh_next = qh->qh_next;
+ if (ehci->qh_scan_next == qh)
+ ehci->qh_scan_next = qh->qh_next.qh;
wmb ();

/* If the controller isn't running, we don't have to wait for it */
@@ -1251,53 +1253,49 @@ static void scan_async (struct ehci_hcd *ehci)
struct ehci_qh *qh;
enum ehci_timer_action action = TIMER_IO_WATCHDOG;

- ehci->stamp = ehci_readl(ehci, &ehci->regs->frame_index);
timer_action_done (ehci, TIMER_ASYNC_SHRINK);
-rescan:
stopped = !HC_IS_RUNNING(ehci_to_hcd(ehci)->state);
- qh = ehci->async->qh_next.qh;
- if (likely (qh != NULL)) {
- do {
- /* clean any finished work for this qh */
- if (!list_empty(&qh->qtd_list) && (stopped ||
- qh->stamp != ehci->stamp)) {
- int temp;
-
- /* unlinks could happen here; completion
- * reporting drops the lock. rescan using
- * the latest schedule, but don't rescan
- * qhs we already finished (no looping)
- * unless the controller is stopped.
- */
- qh = qh_get (qh);
- qh->stamp = ehci->stamp;
- temp = qh_completions (ehci, qh);
- if (qh->needs_rescan)
- unlink_async(ehci, qh);
- qh_put (qh);
- if (temp != 0) {
- goto rescan;
- }
- }

- /* unlink idle entries, reducing DMA usage as well
- * as HCD schedule-scanning costs. delay for any qh
- * we just scanned, there's a not-unusual case that it
- * doesn't stay idle for long.
- * (plus, avoids some kind of re-activation race.)
+ ehci->qh_scan_next = ehci->async->qh_next.qh;
+ while (ehci->qh_scan_next) {
+ qh = ehci->qh_scan_next;
+ ehci->qh_scan_next = qh->qh_next.qh;
+ rescan:
+ /* clean any finished work for this qh */
+ if (!list_empty(&qh->qtd_list)) {
+ int temp;
+
+ /*
+ * Unlinks could happen here; completion reporting
+ * drops the lock. That's why ehci->qh_scan_next
+ * always holds the next qh to scan; if the next qh
+ * gets unlinked then ehci->qh_scan_next is adjusted
+ * in start_unlink_async().
*/
- if (list_empty(&qh->qtd_list)
- && qh->qh_state == QH_STATE_LINKED) {
- if (!ehci->reclaim && (stopped ||
- ((ehci->stamp - qh->stamp) & 0x1fff)
- >= EHCI_SHRINK_FRAMES * 8))
- start_unlink_async(ehci, qh);
- else
- action = TIMER_ASYNC_SHRINK;
- }
+ qh = qh_get(qh);
+ temp = qh_completions(ehci, qh);
+ if (qh->needs_rescan)
+ unlink_async(ehci, qh);
+ qh->unlink_time = jiffies + EHCI_SHRINK_JIFFIES;
+ qh_put(qh);
+ if (temp != 0)
+ goto rescan;
+ }

- qh = qh->qh_next.qh;
- } while (qh);
+ /* unlink idle entries, reducing DMA usage as well
+ * as HCD schedule-scanning costs. delay for any qh
+ * we just scanned, there's a not-unusual case that it
+ * doesn't stay idle for long.
+ * (plus, avoids some kind of re-activation race.)
+ */
+ if (list_empty(&qh->qtd_list)
+ && qh->qh_state == QH_STATE_LINKED) {
+ if (!ehci->reclaim && (stopped ||
+ time_after_eq(jiffies, qh->unlink_time)))
+ start_unlink_async(ehci, qh);
+ else
+ action = TIMER_ASYNC_SHRINK;
+ }
}
if (action == TIMER_ASYNC_SHRINK)
timer_action (ehci, TIMER_ASYNC_SHRINK);
diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
index 1bb7a7f1ae77..d32d26e27f53 100644
--- a/drivers/usb/host/ehci.h
+++ b/drivers/usb/host/ehci.h
@@ -74,6 +74,7 @@ struct ehci_hcd { /* one per controller */
/* async schedule support */
struct ehci_qh *async;
struct ehci_qh *reclaim;
+ struct ehci_qh *qh_scan_next;
unsigned scanning : 1;

/* periodic schedule support */
@@ -116,7 +117,6 @@ struct ehci_hcd { /* one per controller */
struct timer_list iaa_watchdog;
struct timer_list watchdog;
unsigned long actions;
- unsigned stamp;
unsigned random_frame;
unsigned long next_statechange;
ktime_t last_periodic_enable;
@@ -336,6 +336,7 @@ struct ehci_qh {
struct ehci_qh *reclaim; /* next to reclaim */

struct ehci_hcd *ehci;
+ unsigned long unlink_time;

/*
* Do NOT use atomic operations for QH refcounting. On some CPUs
--
1.8.5.2

2014-02-05 20:32:34

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 145/213] USB: mos7840: fix port-device leak in error path

From: Johan Hovold <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 3eb55cc4ed88eee3b5230f66abcdbd2a91639eda upstream.

The driver set the usb-serial port pointers to NULL on errors in attach,
effectively preventing usb-serial core from decrementing the port ref
counters and releasing the port devices and associated data.

Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/serial/mos7840.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index d891d44501f8..c55d2aa35eaa 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -2565,7 +2565,6 @@ error:
kfree(mos7840_port->ctrl_buf);
usb_free_urb(mos7840_port->control_urb);
kfree(mos7840_port);
- serial->port[i] = NULL;
}
return status;
}
--
1.8.5.2

2014-02-05 20:32:31

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 143/213] USB: serial: Fix memory leak in sierra_release()

From: Lennart Sorensen <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit f7bc5051667b74c3861f79eed98c60d5c3b883f7 upstream.

I found a memory leak in sierra_release() (well sierra_probe() I guess)
that looses 8 bytes each time the driver releases a device.

Signed-off-by: Len Sorensen <[email protected]>
Acked-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/serial/sierra.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/sierra.c b/drivers/usb/serial/sierra.c
index e3f32a41ef34..42ea133b0d57 100644
--- a/drivers/usb/serial/sierra.c
+++ b/drivers/usb/serial/sierra.c
@@ -980,6 +980,7 @@ static void sierra_release(struct usb_serial *serial)
continue;
kfree(portdata);
}
+ kfree(serial->private);
}

#ifdef CONFIG_PM
--
1.8.5.2

2014-02-05 20:07:00

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 139/213] xHCI: Correct the #define XHCI_LEGACY_DISABLE_SMI

From: Alex He <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 95018a53f7653e791bba1f54c8d75d9cb700d1bd upstream.

Re-define XHCI_LEGACY_DISABLE_SMI and used it in right way. All SMI enable
bits will be cleared to zero and flag bits 29:31 are also cleared to zero.
Other bits should be presvered as Table 146.

This patch should be backported to kernels as old as 2.6.31.

Signed-off-by: Alex He <[email protected]>
Signed-off-by: Sarah Sharp <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/host/pci-quirks.c | 10 +++++++---
drivers/usb/host/xhci-ext-caps.h | 5 +++--
2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c
index bfbc6b97eb8f..afc37d4879cc 100644
--- a/drivers/usb/host/pci-quirks.c
+++ b/drivers/usb/host/pci-quirks.c
@@ -466,9 +466,13 @@ static void __devinit quirk_usb_handoff_xhci(struct pci_dev *pdev)
}
}

- /* Disable any BIOS SMIs */
- writel(XHCI_LEGACY_DISABLE_SMI,
- base + ext_cap_offset + XHCI_LEGACY_CONTROL_OFFSET);
+ val = readl(base + ext_cap_offset + XHCI_LEGACY_CONTROL_OFFSET);
+ /* Mask off (turn off) any enabled SMIs */
+ val &= XHCI_LEGACY_DISABLE_SMI;
+ /* Mask all SMI events bits, RW1C */
+ val |= XHCI_LEGACY_SMI_EVENTS;
+ /* Disable any BIOS SMIs and clear all SMI events*/
+ writel(val, base + ext_cap_offset + XHCI_LEGACY_CONTROL_OFFSET);

hc_init:
op_reg_base = base + XHCI_HC_LENGTH(readl(base));
diff --git a/drivers/usb/host/xhci-ext-caps.h b/drivers/usb/host/xhci-ext-caps.h
index 78c4edac1db1..e2acc97b169b 100644
--- a/drivers/usb/host/xhci-ext-caps.h
+++ b/drivers/usb/host/xhci-ext-caps.h
@@ -62,8 +62,9 @@
/* USB Legacy Support Control and Status Register - section 7.1.2 */
/* Add this offset, plus the value of xECP in HCCPARAMS to the base address */
#define XHCI_LEGACY_CONTROL_OFFSET (0x04)
-/* bits 1:2, 5:12, and 17:19 need to be preserved; bits 21:28 should be zero */
-#define XHCI_LEGACY_DISABLE_SMI ((0x3 << 1) + (0xff << 5) + (0x7 << 17))
+/* bits 1:3, 5:12, and 17:19 need to be preserved; bits 21:28 should be zero */
+#define XHCI_LEGACY_DISABLE_SMI ((0x7 << 1) + (0xff << 5) + (0x7 << 17))
+#define XHCI_LEGACY_SMI_EVENTS (0x7 << 29)

/* command register values to disable interrupts and halt the HC */
/* start/stop HC execution - do not write unless HC is halted*/
--
1.8.5.2

2014-02-05 20:34:22

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 144/213] USB: mos7840: fix urb leak at release

From: Johan Hovold <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 65a4cdbb170e4ec1a7fa0e94936d47e24a17b0e8 upstream.

Make sure control urb is freed at release.

Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/serial/mos7840.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index 16f0548f5f3d..d891d44501f8 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -2632,6 +2632,7 @@ static void mos7840_release(struct usb_serial *serial)
mos7840_port = mos7840_get_port_private(serial->port[i]);
dbg("mos7840_port %d = %p", i, mos7840_port);
if (mos7840_port) {
+ usb_free_urb(mos7840_port->control_urb);
kfree(mos7840_port->ctrl_buf);
kfree(mos7840_port->dr);
kfree(mos7840_port);
--
1.8.5.2

2014-02-05 20:35:10

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 140/213] xhci: Don't write zeroed pointers to xHC registers.

From: Sarah Sharp <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 159e1fcc9a60fc7daba23ee8fcdb99799de3fe84 upstream.

When xhci_mem_cleanup() is called, we can't be sure if the xHC is
actually halted. We can ask the xHC to halt by writing to the RUN bit
in the command register, but that might timeout due to a HW hang.

If the host controller is still running, we should not write zeroed
values to the event ring dequeue pointers or base tables, the DCBAA
pointers, or the command ring pointers. Eric Fu reports his VIA VL800
host accesses the event ring pointers after a failed register restore on
resume from suspend. The hypothesis is that the host never actually
halted before the register write to change the event ring pointer to
zero.

Remove all writes of zeroed values to pointer registers in
xhci_mem_cleanup(). Instead, make all callers of the function reset the
host controller first, which will reset those registers to zero.
xhci_mem_init() is the only caller that doesn't first halt and reset the
host controller before calling xhci_mem_cleanup().

This should be backported to kernels as old as 2.6.32.

Signed-off-by: Sarah Sharp <[email protected]>
Tested-by: Elric Fu <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/host/xhci-mem.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index cb743a6bcfe4..e244e8cc5c1d 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1031,11 +1031,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
int i;

/* Free the Event Ring Segment Table and the actual Event Ring */
- if (xhci->ir_set) {
- xhci_writel(xhci, 0, &xhci->ir_set->erst_size);
- xhci_write_64(xhci, 0, &xhci->ir_set->erst_base);
- xhci_write_64(xhci, 0, &xhci->ir_set->erst_dequeue);
- }
size = sizeof(struct xhci_erst_entry)*(xhci->erst.num_entries);
if (xhci->erst.entries)
pci_free_consistent(pdev, size,
@@ -1047,7 +1042,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
xhci->event_ring = NULL;
xhci_dbg(xhci, "Freed event ring\n");

- xhci_write_64(xhci, 0, &xhci->op_regs->cmd_ring);
xhci->cmd_ring_reserved_trbs = 0;
if (xhci->cmd_ring)
xhci_ring_free(xhci, xhci->cmd_ring);
@@ -1067,7 +1061,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
xhci->device_pool = NULL;
xhci_dbg(xhci, "Freed device context pool\n");

- xhci_write_64(xhci, 0, &xhci->op_regs->dcbaa_ptr);
if (xhci->dcbaa)
pci_free_consistent(pdev, sizeof(*xhci->dcbaa),
xhci->dcbaa, xhci->dcbaa->dma);
@@ -1403,6 +1396,8 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)

fail:
xhci_warn(xhci, "Couldn't initialize memory\n");
+ xhci_halt(xhci);
+ xhci_reset(xhci);
xhci_mem_cleanup(xhci);
return -ENOMEM;
}
--
1.8.5.2

2014-02-05 20:35:07

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 132/213] Bluetooth: L2CAP - Fix info leak via getsockname()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 792039c73cf176c8e39a6e8beef2c94ff46522ed upstream.

The L2CAP code fails to initialize the l2_bdaddr_type member of struct
sockaddr_l2 and the padding byte added for alignment. It that for leaks
two bytes kernel stack via the getsockname() syscall. Add an explicit
memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Marcel Holtmann <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Johan Hedberg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[PG: net/bluetooth/l2cap_sock.c --> net/bluetooth/l2cap.c in .34]
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/bluetooth/l2cap.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
index 0b6cf87d5eb0..64ccd83d52a2 100644
--- a/net/bluetooth/l2cap.c
+++ b/net/bluetooth/l2cap.c
@@ -1191,6 +1191,7 @@ static int l2cap_sock_getname(struct socket *sock, struct sockaddr *addr, int *l

BT_DBG("sock %p, sk %p", sock, sk);

+ memset(la, 0, sizeof(struct sockaddr_l2));
addr->sa_family = AF_BLUETOOTH;
*len = sizeof(struct sockaddr_l2);

--
1.8.5.2

2014-02-05 20:35:55

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 138/213] xhci: Reset reserved command ring TRBs on cleanup.

From: Sarah Sharp <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 33b2831ac870d50cc8e01c317b07fb1e69c13fe1 upstream.

When the xHCI driver needs to clean up memory (perhaps due to a failed
register restore on resume from S3 or resume from S4), it needs to reset
the number of reserved TRBs on the command ring to zero. Otherwise,
several resume cycles (about 30) with a UAS device attached will
continually increment the number of reserved TRBs, until all command
submissions fail because there isn't enough room on the command ring.

This patch should be backported to kernels as old as 2.6.32,
that contain the commit 913a8a344ffcaf0b4a586d6662a2c66a7106557d
"USB: xhci: Change how xHCI commands are handled."

Signed-off-by: Sarah Sharp <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/host/xhci-mem.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 31cf540480fe..cb743a6bcfe4 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1048,6 +1048,7 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
xhci_dbg(xhci, "Freed event ring\n");

xhci_write_64(xhci, 0, &xhci->op_regs->cmd_ring);
+ xhci->cmd_ring_reserved_trbs = 0;
if (xhci->cmd_ring)
xhci_ring_free(xhci, xhci->cmd_ring);
xhci->cmd_ring = NULL;
--
1.8.5.2

2014-02-05 20:35:53

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 130/213] Bluetooth: RFCOMM - Fix info leak via getsockname()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 9344a972961d1a6d2c04d9008b13617bcb6ec2ef upstream.

The RFCOMM code fails to initialize the trailing padding byte of struct
sockaddr_rc added for alignment. It that for leaks one byte kernel stack
via the getsockname() syscall. Add an explicit memset(0) before filling
the structure to avoid the info leak.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Marcel Holtmann <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Johan Hedberg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/bluetooth/rfcomm/sock.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index b045bbbc2353..92aa7a012110 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -547,6 +547,7 @@ static int rfcomm_sock_getname(struct socket *sock, struct sockaddr *addr, int *

BT_DBG("sock %p, sk %p", sock, sk);

+ memset(sa, 0, sizeof(*sa));
sa->rc_family = AF_BLUETOOTH;
sa->rc_channel = rfcomm_pi(sk)->channel;
if (peer)
--
1.8.5.2

2014-02-05 20:06:49

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 126/213] crypto: cryptd - disable softirqs in cryptd_queue_worker to prevent data corruption

From: Jussi Kivilinna <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 9efade1b3e981f5064f9db9ca971b4dc7557ae42 upstream.

cryptd_queue_worker attempts to prevent simultaneous accesses to crypto
workqueue by cryptd_enqueue_request using preempt_disable/preempt_enable.
However cryptd_enqueue_request might be called from softirq context,
so add local_bh_disable/local_bh_enable to prevent data corruption and
panics.

Bug report at http://marc.info/?l=linux-crypto-vger&m=134858649616319&w=2

v2:
- Disable software interrupts instead of hardware interrupts

Reported-by: Gurucharan Shetty <[email protected]>
Signed-off-by: Jussi Kivilinna <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
crypto/cryptd.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/crypto/cryptd.c b/crypto/cryptd.c
index ef71318976c7..6e241640219d 100644
--- a/crypto/cryptd.c
+++ b/crypto/cryptd.c
@@ -116,13 +116,18 @@ static void cryptd_queue_worker(struct work_struct *work)
struct crypto_async_request *req, *backlog;

cpu_queue = container_of(work, struct cryptd_cpu_queue, work);
- /* Only handle one request at a time to avoid hogging crypto
- * workqueue. preempt_disable/enable is used to prevent
- * being preempted by cryptd_enqueue_request() */
+ /*
+ * Only handle one request at a time to avoid hogging crypto workqueue.
+ * preempt_disable/enable is used to prevent being preempted by
+ * cryptd_enqueue_request(). local_bh_disable/enable is used to prevent
+ * cryptd_enqueue_request() being accessed from software interrupts.
+ */
+ local_bh_disable();
preempt_disable();
backlog = crypto_get_backlog(&cpu_queue->queue);
req = crypto_dequeue_request(&cpu_queue->queue);
preempt_enable();
+ local_bh_enable();

if (!req)
return;
--
1.8.5.2

2014-02-05 20:36:33

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 136/213] xhci: Make handover code more robust

From: Matthew Garrett <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit e955a1cd086de4d165ae0f4c7be7289d84b63bdc upstream.

My test platform (Intel DX79SI) boots reliably under BIOS, but frequently
crashes when booting via UEFI. I finally tracked this down to the xhci
handoff code. It seems that reads from the device occasionally just return
0xff, resulting in xhci_find_next_cap_offset generating a value that's
larger than the resource region. We then oops when attempting to read the
value. Sanity checking that value lets us avoid the crash.

I've no idea what's causing the underlying problem, and xhci still doesn't
actually *work* even with this, but the machine at least boots which will
probably make further debugging easier.

This should be backported to kernels as old as 2.6.31, that contain the
commit 66d4eadd8d067269ea8fead1a50fe87c2979a80d "USB: xhci: BIOS handoff
and HW initialization."

Signed-off-by: Matthew Garrett <[email protected]>
Signed-off-by: Sarah Sharp <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/host/pci-quirks.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c
index eae8b18437cb..bfbc6b97eb8f 100644
--- a/drivers/usb/host/pci-quirks.c
+++ b/drivers/usb/host/pci-quirks.c
@@ -418,12 +418,12 @@ static void __devinit quirk_usb_handoff_xhci(struct pci_dev *pdev)
void __iomem *op_reg_base;
u32 val;
int timeout;
+ int len = pci_resource_len(pdev, 0);

if (!mmio_resource_enabled(pdev, 0))
return;

- base = ioremap_nocache(pci_resource_start(pdev, 0),
- pci_resource_len(pdev, 0));
+ base = ioremap_nocache(pci_resource_start(pdev, 0), len);
if (base == NULL)
return;

@@ -433,9 +433,17 @@ static void __devinit quirk_usb_handoff_xhci(struct pci_dev *pdev)
*/
ext_cap_offset = xhci_find_next_cap_offset(base, XHCI_HCC_PARAMS_OFFSET);
do {
+ if ((ext_cap_offset + sizeof(val)) > len) {
+ /* We're reading garbage from the controller */
+ dev_warn(&pdev->dev,
+ "xHCI controller failing to respond");
+ return;
+ }
+
if (!ext_cap_offset)
/* We've reached the end of the extended capabilities */
goto hc_init;
+
val = readl(base + ext_cap_offset);
if (XHCI_EXT_CAPS_ID(val) == XHCI_EXT_CAPS_LEGACY)
break;
--
1.8.5.2

2014-02-05 20:36:31

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 131/213] Bluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit e11e0455c0d7d3d62276a0c55d9dfbc16779d691 upstream.

If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns
early with 0 without updating the possibly set msg_namelen member. This,
in turn, leads to a 128 byte kernel stack leak in net/socket.c.

Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_stream_recvmsg().

Cc: Marcel Holtmann <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Johan Hedberg <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/bluetooth/rfcomm/sock.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 92aa7a012110..557122ee3e24 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -656,6 +656,7 @@ static int rfcomm_sock_recvmsg(struct kiocb *iocb, struct socket *sock,

if (test_and_clear_bit(RFCOMM_DEFER_SETUP, &d->flags)) {
rfcomm_dlc_accept(d);
+ msg->msg_namelen = 0;
return 0;
}

--
1.8.5.2

2014-02-05 20:37:01

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 137/213] xhci: Increase reset timeout for Renesas 720201 host.

From: Sarah Sharp <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 22ceac191211cf6688b1bf6ecd93c8b6bf80ed9b upstream.

The NEC/Renesas 720201 xHCI host controller does not complete its reset
within 250 milliseconds. In fact, it takes about 9 seconds to reset the
host controller, and 1 second for the host to be ready for doorbell
rings. Extend the reset and CNR polling timeout to 10 seconds each.

This patch should be backported to kernels as old as 2.6.31, that
contain the commit 66d4eadd8d067269ea8fead1a50fe87c2979a80d "USB: xhci:
BIOS handoff and HW initialization."

Signed-off-by: Sarah Sharp <[email protected]>
Reported-by: Edwin Klein Mentink <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/usb/host/xhci.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 0a5901f607c6..d8ec9ec57d15 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -158,7 +158,7 @@ int xhci_reset(struct xhci_hcd *xhci)
xhci_to_hcd(xhci)->state = HC_STATE_HALT;

ret = handshake(xhci, &xhci->op_regs->command,
- CMD_RESET, 0, 250 * 1000);
+ CMD_RESET, 0, 10 * 1000 * 1000);
if (ret)
return ret;

@@ -167,7 +167,8 @@ int xhci_reset(struct xhci_hcd *xhci)
* xHCI cannot write to any doorbells or operational registers other
* than status until the "Controller Not Ready" flag is cleared.
*/
- return handshake(xhci, &xhci->op_regs->status, STS_CNR, 0, 250 * 1000);
+ return handshake(xhci, &xhci->op_regs->status,
+ STS_CNR, 0, 10 * 1000 * 1000);
}


--
1.8.5.2

2014-02-05 20:37:04

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 135/213] Bluetooth: hci_ldisc: fix NULL-pointer dereference on tty_close

From: Johan Hovold <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 33b69bf80a3704d45341928e4ff68b6ebd470686 upstream.

Do not close protocol driver until device has been unregistered.

This fixes a race between tty_close and hci_dev_open which can result in
a NULL-pointer dereference.

The line discipline closes the protocol driver while we may still have
hci_dev_open sleeping on the req_lock mutex resulting in a NULL-pointer
dereference when lock is acquired and hci_init_req called.

Bug is 100% reproducible using hciattach and a disconnected serial port:

0. # hciattach -n ttyO1 any noflow

1. hci_dev_open called from hci_power_on grabs req lock
2. hci_init_req executes but device fails to initialise (times out
eventually)
3. hci_dev_open is called from hci_sock_ioctl and sleeps on req lock
4. hci_uart_tty_close detaches protocol driver and cancels init req
5. hci_dev_open (1) releases req lock
6. hci_dev_open (3) grabs req lock, calls hci_init_req, which triggers oops
when request is prepared in hci_uart_send_frame

[ 137.201263] Unable to handle kernel NULL pointer dereference at virtual address 00000028
[ 137.209838] pgd = c0004000
[ 137.212677] [00000028] *pgd=00000000
[ 137.216430] Internal error: Oops: 17 [#1]
[ 137.220642] Modules linked in:
[ 137.223846] CPU: 0 Tainted: G W (3.3.0-rc6-dirty #406)
[ 137.230529] PC is at __lock_acquire+0x5c/0x1ab0
[ 137.235290] LR is at lock_acquire+0x9c/0x128
[ 137.239776] pc : [<c0071490>] lr : [<c00733f8>] psr: 20000093
[ 137.239776] sp : cf869dd8 ip : c0529554 fp : c051c730
[ 137.251800] r10: 00000000 r9 : cf8673c0 r8 : 00000080
[ 137.257293] r7 : 00000028 r6 : 00000002 r5 : 00000000 r4 : c053fd70
[ 137.264129] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : 00000001
[ 137.270965] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 137.278717] Control: 10c5387d Table: 8f0f4019 DAC: 00000015
[ 137.284729] Process kworker/u:1 (pid: 7, stack limit = 0xcf8682e8)
[ 137.291229] Stack: (0xcf869dd8 to 0xcf86a000)
[ 137.295776] 9dc0: c0529554 00000000
[ 137.304351] 9de0: cf8673c0 cf868000 d03ea1ef cf868000 000001ef 00000470 00000000 00000002
[ 137.312927] 9e00: cf8673c0 00000001 c051c730 c00716ec 0000000c 00000440 c0529554 00000001
[ 137.321533] 9e20: c051c730 cf868000 d03ea1f3 00000000 c053b978 00000000 00000028 cf868000
[ 137.330078] 9e40: 00000000 00000000 00000002 00000000 00000000 c00733f8 00000002 00000080
[ 137.338684] 9e60: 00000000 c02a1d50 00000000 00000001 60000013 c0969a1c 60000093 c053b96c
[ 137.347259] 9e80: 00000002 00000018 20000013 c02a1d50 cf0ac000 00000000 00000002 cf868000
[ 137.355834] 9ea0: 00000089 c0374130 00000002 00000000 c02a1d50 cf0ac000 0000000c cf0fc540
[ 137.364410] 9ec0: 00000018 c02a1d50 cf0fc540 00000000 cf0fc540 c0282238 c028220c cf178d80
[ 137.372985] 9ee0: 127525d8 c02821cc 9a1fa451 c032727c 9a1fa451 127525d8 cf0fc540 cf0ac4ec
[ 137.381561] 9f00: cf0ac000 cf0fc540 cf0ac584 c03285f4 c0328580 cf0ac4ec cf85c740 c05510cc
[ 137.390136] 9f20: ce825400 c004c914 00000002 00000000 c004c884 ce8254f5 cf869f48 00000000
[ 137.398712] 9f40: c0328580 ce825415 c0a7f914 c061af64 00000000 c048cf3c cf8673c0 cf85c740
[ 137.407287] 9f60: c05510cc c051a66c c05510ec c05510c4 cf85c750 cf868000 00000089 c004d6ac
[ 137.415863] 9f80: 00000000 c0073d14 00000001 cf853ed8 cf85c740 c004d558 00000013 00000000
[ 137.424438] 9fa0: 00000000 00000000 00000000 c00516b0 00000000 00000000 cf85c740 00000000
[ 137.433013] 9fc0: 00000001 dead4ead ffffffff ffffffff c0551674 00000000 00000000 c0450aa4
[ 137.441589] 9fe0: cf869fe0 cf869fe0 cf853ed8 c005162c c0013b30 c0013b30 00ffff00 00ffff00
[ 137.450164] [<c0071490>] (__lock_acquire+0x5c/0x1ab0) from [<c00733f8>] (lock_acquire+0x9c/0x128)
[ 137.459503] [<c00733f8>] (lock_acquire+0x9c/0x128) from [<c0374130>] (_raw_spin_lock_irqsave+0x44/0x58)
[ 137.469360] [<c0374130>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c02a1d50>] (skb_queue_tail+0x18/0x48)
[ 137.479339] [<c02a1d50>] (skb_queue_tail+0x18/0x48) from [<c0282238>] (h4_enqueue+0x2c/0x34)
[ 137.488189] [<c0282238>] (h4_enqueue+0x2c/0x34) from [<c02821cc>] (hci_uart_send_frame+0x34/0x68)
[ 137.497497] [<c02821cc>] (hci_uart_send_frame+0x34/0x68) from [<c032727c>] (hci_send_frame+0x50/0x88)
[ 137.507171] [<c032727c>] (hci_send_frame+0x50/0x88) from [<c03285f4>] (hci_cmd_work+0x74/0xd4)
[ 137.516204] [<c03285f4>] (hci_cmd_work+0x74/0xd4) from [<c004c914>] (process_one_work+0x1a0/0x4ec)
[ 137.525604] [<c004c914>] (process_one_work+0x1a0/0x4ec) from [<c004d6ac>] (worker_thread+0x154/0x344)
[ 137.535278] [<c004d6ac>] (worker_thread+0x154/0x344) from [<c00516b0>] (kthread+0x84/0x90)
[ 137.543975] [<c00516b0>] (kthread+0x84/0x90) from [<c0013b30>] (kernel_thread_exit+0x0/0x8)
[ 137.552734] Code: e59f4e5c e5941000 e3510000 0a000031 (e5971000)
[ 137.559234] ---[ end trace 1b75b31a2719ed1e ]---

Signed-off-by: Johan Hovold <[email protected]>
Acked-by: Marcel Holtmann <[email protected]>
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/bluetooth/hci_ldisc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 91be8d53d819..31c653a1005c 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -312,11 +312,11 @@ static void hci_uart_tty_close(struct tty_struct *tty)
hci_uart_close(hdev);

if (test_and_clear_bit(HCI_UART_PROTO_SET, &hu->flags)) {
- hu->proto->close(hu);
if (hdev) {
hci_unregister_dev(hdev);
hci_free_dev(hdev);
}
+ hu->proto->close(hu);
}
}
}
--
1.8.5.2

2014-02-05 20:37:41

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 129/213] Bluetooth: HCI - Fix info leak in getsockopt(HCI_FILTER)

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit e15ca9a0ef9a86f0477530b0f44a725d67f889ee upstream.

The HCI code fails to initialize the two padding bytes of struct
hci_ufilter before copying it to userland -- that for leaking two
bytes kernel stack. Add an explicit memset(0) before filling the
structure to avoid the info leak.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Marcel Holtmann <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Johan Hedberg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/bluetooth/hci_sock.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 38f08f6b86f6..e5d788faf03b 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -583,6 +583,7 @@ static int hci_sock_getsockopt(struct socket *sock, int level, int optname, char
{
struct hci_filter *f = &hci_pi(sk)->filter;

+ memset(&uf, 0, sizeof(uf));
uf.type_mask = f->type_mask;
uf.opcode = f->opcode;
uf.event_mask[0] = *((u32 *) f->event_mask + 0);
--
1.8.5.2

2014-02-05 20:37:58

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [v2.6.34-stable 004/213] crypto: ghash - Avoid null pointer dereference if no key is set

On 14-02-05 03:30 PM, Nick Bowler wrote:
> On 2014-02-05 14:59 -0500, Paul Gortmaker wrote:
>> From: Nick Bowler <[email protected]>
>>
>> -------------------
>> This is a commit scheduled for the next v2.6.34 longterm release.
>> http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
>> If you see a problem with using this for longterm, please comment.
>> -------------------
>>
>> commit 7ed47b7d142ec99ad6880bbbec51e9f12b3af74c upstream.
>>
>> The ghash_update function passes a pointer to gf128mul_4k_lle which will
>> be NULL if ghash_setkey is not called or if the most recent call to
>> ghash_setkey failed to allocate memory. This causes an oops. Fix this
>> up by returning an error code in the null case.
>>
>> This is trivially triggered from unprivileged userspace through the
>> AF_ALG interface by simply writing to the socket without setting a key.
>
> After all this time, I see this patch still manages to find its way,
> occasionally, into the patch queue for older -stable. :)
>
> It should be harmless to apply, but this patch doesn't actually fix
> any real problem on kernels previous to 2.6.38 because the AF_ALG
> userspace interface does not exist in these kernels.

Thanks, I'll drop it then, since less churn in a stable release is
desired, where possible. I saw it tagged 2.6.37+, but then I also
saw it applied in 2.6.32.x, so I wasn't sure what to make of it.

Paul.
--

>
> Cheers,
>

2014-02-05 20:38:31

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 134/213] Bluetooth: add NULL pointer check in HCI

From: Jun Nie <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit d9319560b86839506c2011346b1f2e61438a3c73 upstream.

If we fail to find a hci device pointer in hci_uart, don't try
to deref the NULL one we do have.

Signed-off-by: Jun Nie <[email protected]>
Signed-off-by: Gustavo F. Padovan <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/bluetooth/hci_ldisc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 283b127cea74..91be8d53d819 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -313,8 +313,10 @@ static void hci_uart_tty_close(struct tty_struct *tty)

if (test_and_clear_bit(HCI_UART_PROTO_SET, &hu->flags)) {
hu->proto->close(hu);
- hci_unregister_dev(hdev);
- hci_free_dev(hdev);
+ if (hdev) {
+ hci_unregister_dev(hdev);
+ hci_free_dev(hdev);
+ }
}
}
}
--
1.8.5.2

2014-02-05 20:06:44

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 124/213] bnx2i: Fixed NULL ptr deference for 1G bnx2 Linux iSCSI offload

From: Eddie Wai <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit d6532207116307eb7ecbfa7b9e02c53230096a50 upstream.

This patch fixes the following kernel panic invoked by uninitialized fields
in the chip initialization for the 1G bnx2 iSCSI offload.

One of the bits in the chip initialization is being used by the latest
firmware to control overflow packets. When this control bit gets enabled
erroneously, it would ultimately result in a bad packet placement which would
cause the bnx2 driver to dereference a NULL ptr in the placement handler.

This can happen under certain stress I/O environment under the Linux
iSCSI offload operation.

This change only affects Broadcom's 5709 chipset.

Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
[<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5
Pid: 0, comm: swapper Tainted: G ---- 2.6.18-333.el5debug #2
RIP: 0010:[<ffffffff881f0e7d>] [<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5
RSP: 0018:ffff8101b575bd50 EFLAGS: 00010216
RAX: 0000000000000005 RBX: ffff81007c5fb180 RCX: 0000000000000000
RDX: 0000000000000ffc RSI: 00000000817e8000 RDI: 0000000000000220
RBP: ffff81015bbd7ec0 R08: ffff8100817e9000 R09: 0000000000000000
R10: ffff81007c5fb180 R11: 00000000000000c8 R12: 000000007a25a010
R13: 0000000000000000 R14: 0000000000000005 R15: ffff810159f80558
FS: 0000000000000000(0000) GS:ffff8101afebc240(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000000201000 CR4: 00000000000006a0
Process swapper (pid: 0, threadinfo ffff8101b5754000, task ffff8101afebd820)
Stack: 000000000000000b ffff810159f80000 0000000000000040 ffff810159f80520
ffff810159f80500 00cf00cf8008e84b ffffc200100939e0 ffff810009035b20
0000502900000000 000000be00000001 ffff8100817e7810 00d08101b575bea8
Call Trace:
<IRQ> [<ffffffff8008e0d0>] show_schedstat+0x1c2/0x25b
[<ffffffff881f1886>] :bnx2:bnx2_poll+0xf6/0x231
[<ffffffff8000c9b9>] net_rx_action+0xac/0x1b1
[<ffffffff800125a0>] __do_softirq+0x89/0x133
[<ffffffff8005e30c>] call_softirq+0x1c/0x28
[<ffffffff8006d5de>] do_softirq+0x2c/0x7d
[<ffffffff8006d46e>] do_IRQ+0xee/0xf7
[<ffffffff8005d625>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff801a5780>] acpi_processor_idle_simple+0x1c5/0x341
[<ffffffff801a573d>] acpi_processor_idle_simple+0x182/0x341
[<ffffffff801a55bb>] acpi_processor_idle_simple+0x0/0x341
[<ffffffff80049560>] cpu_idle+0x95/0xb8
[<ffffffff80078b1c>] start_secondary+0x479/0x488

Signed-off-by: Eddie Wai <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/scsi/bnx2i/bnx2i_hwi.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c
index 18352ff82101..cb288b7542e4 100644
--- a/drivers/scsi/bnx2i/bnx2i_hwi.c
+++ b/drivers/scsi/bnx2i/bnx2i_hwi.c
@@ -1184,6 +1184,9 @@ int bnx2i_send_fw_iscsi_init_msg(struct bnx2i_hba *hba)
int rc = 0;
u64 mask64;

+ memset(&iscsi_init, 0x00, sizeof(struct iscsi_kwqe_init1));
+ memset(&iscsi_init2, 0x00, sizeof(struct iscsi_kwqe_init2));
+
bnx2i_adjust_qp_size(hba);

iscsi_init.flags =
--
1.8.5.2

2014-02-05 20:06:42

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 125/213] keys: fix race with concurrent install_user_keyrings()

From: David Howells <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 0da9dfdd2cd9889201bc6f6f43580c99165cd087 upstream.

This fixes CVE-2013-1792.

There is a race in install_user_keyrings() that can cause a NULL pointer
dereference when called concurrently for the same user if the uid and
uid-session keyrings are not yet created. It might be possible for an
unprivileged user to trigger this by calling keyctl() from userspace in
parallel immediately after logging in.

Assume that we have two threads both executing lookup_user_key(), both
looking for KEY_SPEC_USER_SESSION_KEYRING.

THREAD A THREAD B
=============================== ===============================
==>call install_user_keyrings();
if (!cred->user->session_keyring)
==>call install_user_keyrings()
...
user->uid_keyring = uid_keyring;
if (user->uid_keyring)
return 0;
<==
key = cred->user->session_keyring [== NULL]
user->session_keyring = session_keyring;
atomic_inc(&key->usage); [oops]

At the point thread A dereferences cred->user->session_keyring, thread B
hasn't updated user->session_keyring yet, but thread A assumes it is
populated because install_user_keyrings() returned ok.

The race window is really small but can be exploited if, for example,
thread B is interrupted or preempted after initializing uid_keyring, but
before doing setting session_keyring.

This couldn't be reproduced on a stock kernel. However, after placing
systemtap probe on 'user->session_keyring = session_keyring;' that
introduced some delay, the kernel could be crashed reliably.

Fix this by checking both pointers before deciding whether to return.
Alternatively, the test could be done away with entirely as it is checked
inside the mutex - but since the mutex is global, that may not be the best
way.

Signed-off-by: David Howells <[email protected]>
Reported-by: Mateusz Guzik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: James Morris <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
security/keys/process_keys.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index 20a38fed61b1..71c10cec3c18 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -55,7 +55,7 @@ int install_user_keyrings(void)

kenter("%p{%u}", user, user->uid);

- if (user->uid_keyring) {
+ if (user->uid_keyring && user->session_keyring) {
kleave(" = 0 [exist]");
return 0;
}
--
1.8.5.2

2014-02-05 20:39:30

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 122/213] fix Null pointer dereference on disk error

From: Xiaotian Feng <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 26cd4d65deba587f3cf2329b6869ce02bcbe68ec upstream.

Following oops were observed when disk error happened:

[ 4272.896937] sd 0:0:0:0: [sda] Unhandled error code
[ 4272.896939] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 4272.896942] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 5a de a7 00 00 08 00
[ 4272.896951] end_request: I/O error, dev sda, sector 5955239
[ 4291.574947] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 4291.658305] IP: [] ahci_activity_show+0x1/0x40
[ 4291.730090] PGD 76dbbc067 PUD 6c4fba067 PMD 0
[ 4291.783408] Oops: 0000 [#1] SMP
[ 4291.822100] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/sw_activity
[ 4291.934235] CPU 9
[ 4291.958301] Pid: 27942, comm: hwinfo ......

ata_scsi_find_dev could return NULL, so ata_scsi_activity_{show,store} should check if atadev is NULL.

Signed-off-by: Xiaotian Feng <[email protected]>
Cc: James Bottomley <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/ata/libata-scsi.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 0dfa46877e39..191b375df95e 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -339,7 +339,8 @@ ata_scsi_activity_show(struct device *dev, struct device_attribute *attr,
struct ata_port *ap = ata_shost_to_port(sdev->host);
struct ata_device *atadev = ata_scsi_find_dev(ap, sdev);

- if (ap->ops->sw_activity_show && (ap->flags & ATA_FLAG_SW_ACTIVITY))
+ if (atadev && ap->ops->sw_activity_show &&
+ (ap->flags & ATA_FLAG_SW_ACTIVITY))
return ap->ops->sw_activity_show(atadev, buf);
return -EINVAL;
}
@@ -354,7 +355,8 @@ ata_scsi_activity_store(struct device *dev, struct device_attribute *attr,
enum sw_activity val;
int rc;

- if (ap->ops->sw_activity_store && (ap->flags & ATA_FLAG_SW_ACTIVITY)) {
+ if (atadev && ap->ops->sw_activity_store &&
+ (ap->flags & ATA_FLAG_SW_ACTIVITY)) {
val = simple_strtoul(buf, NULL, 0);
switch (val) {
case OFF: case BLINK_ON: case BLINK_OFF:
--
1.8.5.2

2014-02-05 20:39:54

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 117/213] MCE: Fix vm86 handling for 32bit mce handler

From: Andi Kleen <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit a129a7c84582629741e5fa6f40026efcd7a65bd4 upstream.

When running on 32bit the mce handler could misinterpret
vm86 mode as ring 0. This can affect whether it does recovery
or not; it was possible to panic when recovery was actually
possible.

Fix this by always forcing vm86 to look like ring 3.

[ Backport to 3.0 notes:
Things changed there slightly:
- move mce_get_rip() up. It fills up m->cs and m->ip values which
are evaluated in mce_severity(). Therefore move it up right before
the mce_severity call. This seem to be another bug in 3.0?
- Place the backport (fix m->cs in V86 case) to where m->cs gets
filled which is mce_get_rip() in 3.0
]

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Thomas Renninger <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[PG: commit 8ef8fa7479fff9313387b873413f5ae233a2bd04 in v3.0.44]
Signed-off-by: Paul Gortmaker <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8a6f0afa767e..84b313c1297e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -453,6 +453,13 @@ static inline void mce_get_rip(struct mce *m, struct pt_regs *regs)
if (regs && (m->mcgstatus & (MCG_STATUS_RIPV|MCG_STATUS_EIPV))) {
m->ip = regs->ip;
m->cs = regs->cs;
+ /*
+ * When in VM86 mode make the cs look like ring 3
+ * always. This is a lie, but it's better than passing
+ * the additional vm86 bit around everywhere.
+ */
+ if (v8086_mode(regs))
+ m->cs |= 3;
} else {
m->ip = 0;
m->cs = 0;
@@ -990,6 +997,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
*/
add_taint(TAINT_MACHINE_CHECK);

+ mce_get_rip(&m, regs);
severity = mce_severity(&m, tolerant, NULL);

/*
@@ -1028,7 +1036,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
if (severity == MCE_AO_SEVERITY && mce_usable_address(&m))
mce_ring_add(m.addr >> PAGE_SHIFT);

- mce_get_rip(&m, regs);
mce_log(&m);

if (severity > worst) {
--
1.8.5.2

2014-02-05 20:40:16

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 127/213] IPoIB: Fix use-after-free of multicast object

From: Patrick McHardy <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit bea1e22df494a729978e7f2c54f7bda328f74bc3 upstream.

Fix a crash in ipoib_mcast_join_task(). (with help from Or Gerlitz)

Commit c8c2afe360b7 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().

In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries. This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.

Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called. To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().

Signed-off-by: Patrick McHardy <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 19 ++++++++++---------
2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index b4b22576f12a..f6a23ecd982f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -157,7 +157,7 @@ static int ipoib_stop(struct net_device *dev)

netif_stop_queue(dev);

- ipoib_ib_dev_down(dev, 0);
+ ipoib_ib_dev_down(dev, 1);
ipoib_ib_dev_stop(dev, 0);

if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index b166bb75753d..917540c81260 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -189,7 +189,9 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,

mcast->mcmember = *mcmember;

- /* Set the cached Q_Key before we attach if it's the broadcast group */
+ /* Set the multicast MTU and cached Q_Key before we attach if it's
+ * the broadcast group.
+ */
if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
sizeof (union ib_gid))) {
spin_lock_irq(&priv->lock);
@@ -197,10 +199,17 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
spin_unlock_irq(&priv->lock);
return -EAGAIN;
}
+ priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));
priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
spin_unlock_irq(&priv->lock);
priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
set_qkey = 1;
+
+ if (!ipoib_cm_admin_enabled(dev)) {
+ rtnl_lock();
+ dev_set_mtu(dev, min(priv->mcast_mtu, priv->admin_mtu));
+ rtnl_unlock();
+ }
}

if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) {
@@ -589,14 +598,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
return;
}

- priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));
-
- if (!ipoib_cm_admin_enabled(dev)) {
- rtnl_lock();
- dev_set_mtu(dev, min(priv->mcast_mtu, priv->admin_mtu));
- rtnl_unlock();
- }
-
ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n");

clear_bit(IPOIB_MCAST_RUN, &priv->flags);
--
1.8.5.2

2014-02-05 20:40:59

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 123/213] fix crash in scsi_dispatch_cmd()

From: James Bottomley <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit bfe159a51203c15d23cb3158fffdc25ec4b4dda1 upstream.

USB surprise removal of sr is triggering an oops in
scsi_dispatch_command(). What seems to be happening is that USB is
hanging on to a queue reference until the last close of the upper
device, so the crash is caused by surprise remove of a mounted CD
followed by attempted unmount.

The problem is that USB doesn't issue its final commands as part of
the SCSI teardown path, but on last close when the block queue is long
gone. The long term fix is probably to make sr do the teardown in the
same way as sd (so remove all the lower bits on ejection, but keep the
upper disk alive until last close of user space). However, the
current oops can be simply fixed by not allowing any commands to be
sent to a dead queue.

Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
block/blk-core.c | 3 +++
block/blk-exec.c | 7 +++++++
drivers/scsi/scsi_lib.c | 2 ++
3 files changed, 12 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 94f274bc9683..2ddcf96c854d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -865,6 +865,9 @@ struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
{
struct request *rq;

+ if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
+ return NULL;
+
BUG_ON(rw != READ && rw != WRITE);

spin_lock_irq(q->queue_lock);
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 49557e91f0da..85bd7b445d86 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -50,6 +50,13 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
{
int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;

+ if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) {
+ rq->errors = -ENXIO;
+ if (rq->end_io)
+ rq->end_io(rq, rq->errors);
+ return;
+ }
+
rq->rq_disk = bd_disk;
rq->end_io = done;
WARN_ON(irqs_disabled());
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ca8666b19c54..6712297407bb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -215,6 +215,8 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
int ret = DRIVER_ERROR << 24;

req = blk_get_request(sdev->request_queue, write, __GFP_WAIT);
+ if (!req)
+ return ret;

if (bufflen && blk_rq_map_kern(sdev->request_queue, req,
buffer, bufflen, __GFP_WAIT))
--
1.8.5.2

2014-02-05 20:41:04

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 120/213] drivers/char/ipmi: memcpy, need additional 2 bytes to avoid memory overflow

From: Chen Gang <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit a5f2b3d6a738e7d4180012fe7b541172f8c8dcea upstream.

When calling memcpy, read_data and write_data need additional 2 bytes.

write_data:
for checking: "if (size > IPMI_MAX_MSG_LENGTH)"
for operating: "memcpy(bt->write_data + 3, data + 1, size - 1)"

read_data:
for checking: "if (msg_len < 3 || msg_len > IPMI_MAX_MSG_LENGTH)"
for operating: "memcpy(data + 2, bt->read_data + 4, msg_len - 2)"

Signed-off-by: Chen Gang <[email protected]>
Signed-off-by: Corey Minyard <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/char/ipmi/ipmi_bt_sm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_bt_sm.c b/drivers/char/ipmi/ipmi_bt_sm.c
index 7b98c067190a..a65a574eac6b 100644
--- a/drivers/char/ipmi/ipmi_bt_sm.c
+++ b/drivers/char/ipmi/ipmi_bt_sm.c
@@ -95,9 +95,9 @@ struct si_sm_data {
enum bt_states state;
unsigned char seq; /* BT sequence number */
struct si_sm_io *io;
- unsigned char write_data[IPMI_MAX_MSG_LENGTH];
+ unsigned char write_data[IPMI_MAX_MSG_LENGTH + 2]; /* +2 for memcpy */
int write_count;
- unsigned char read_data[IPMI_MAX_MSG_LENGTH];
+ unsigned char read_data[IPMI_MAX_MSG_LENGTH + 2]; /* +2 for memcpy */
int read_count;
int truncated;
long timeout; /* microseconds countdown */
--
1.8.5.2

2014-02-05 20:06:27

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 080/213] clockevents: Don't allow dummy broadcast timers

From: Mark Rutland <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit a7dc19b8652c862d5b7c4d2339bd3c428bd29c4a upstream.

Currently tick_check_broadcast_device doesn't reject clock_event_devices
with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real
hardware if they have a higher rating value. In this situation, the
dummy timer is responsible for broadcasting to itself, and the core
clockevents code may attempt to call non-existent callbacks for
programming the dummy, eventually leading to a panic.

This patch makes tick_check_broadcast_device always reject dummy timers,
preventing this problem.

Signed-off-by: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: Jon Medhurst (Tixy) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/time/tick-broadcast.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 521987f85874..ab16ee785b7f 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -67,7 +67,8 @@ static void tick_broadcast_start_periodic(struct clock_event_device *bc)
*/
int tick_check_broadcast_device(struct clock_event_device *dev)
{
- if ((tick_broadcast_device.evtdev &&
+ if ((dev->features & CLOCK_EVT_FEAT_DUMMY) ||
+ (tick_broadcast_device.evtdev &&
tick_broadcast_device.evtdev->rating >= dev->rating) ||
(dev->features & CLOCK_EVT_FEAT_C3STOP))
return 0;
--
1.8.5.2

2014-02-05 20:41:56

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 121/213] w1: fix oops when w1_search is called from netlink connector

From: Marcin Jurkowski <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 9d1817cab2f030f6af360e961cc69bb1da8ad765 upstream.

On Sat, Mar 02, 2013 at 10:45:10AM +0100, Sven Geggus wrote:
> This is the bad commit I found doing git bisect:
> 04f482faf50535229a5a5c8d629cf963899f857c is the first bad commit
> commit 04f482faf50535229a5a5c8d629cf963899f857c
> Author: Patrick McHardy <[email protected]>
> Date: Mon Mar 28 08:39:36 2011 +0000

Good job. I was too lazy to bisect for bad commit;)

Reading the code I found problematic kthread_should_stop call from netlink
connector which causes the oops. After applying a patch, I've been testing
owfs+w1 setup for nearly two days and it seems to work very reliable (no
hangs, no memleaks etc).
More detailed description and possible fix is given below:

Function w1_search can be called from either kthread or netlink callback.
While the former works fine, the latter causes oops due to kthread_should_stop
invocation.

This patch adds a check if w1_search is serving netlink command, skipping
kthread_should_stop invocation if so.

Signed-off-by: Marcin Jurkowski <[email protected]>
Acked-by: Evgeniy Polyakov <[email protected]>
Cc: Josh Boyer <[email protected]>
Tested-by: Sven Geggus <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/w1/w1.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index ad5897dc4495..cf05f4c82baa 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -918,7 +918,8 @@ void w1_search(struct w1_master *dev, u8 search_type, w1_slave_found_callback cb
tmp64 = (triplet_ret >> 2);
rn |= (tmp64 << i);

- if (kthread_should_stop()) {
+ /* ensure we're called from kthread and not by netlink callback */
+ if (!dev->priv && kthread_should_stop()) {
dev_dbg(&dev->dev, "Abort w1_search\n");
return;
}
--
1.8.5.2

2014-02-05 20:42:43

by Nick Bowler

[permalink] [raw]
Subject: Re: [v2.6.34-stable 004/213] crypto: ghash - Avoid null pointer dereference if no key is set

On 2014-02-05 14:59 -0500, Paul Gortmaker wrote:
> From: Nick Bowler <[email protected]>
>
> -------------------
> This is a commit scheduled for the next v2.6.34 longterm release.
> http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
> If you see a problem with using this for longterm, please comment.
> -------------------
>
> commit 7ed47b7d142ec99ad6880bbbec51e9f12b3af74c upstream.
>
> The ghash_update function passes a pointer to gf128mul_4k_lle which will
> be NULL if ghash_setkey is not called or if the most recent call to
> ghash_setkey failed to allocate memory. This causes an oops. Fix this
> up by returning an error code in the null case.
>
> This is trivially triggered from unprivileged userspace through the
> AF_ALG interface by simply writing to the socket without setting a key.

After all this time, I see this patch still manages to find its way,
occasionally, into the patch queue for older -stable. :)

It should be harmless to apply, but this patch doesn't actually fix
any real problem on kernels previous to 2.6.38 because the AF_ALG
userspace interface does not exist in these kernels.

Cheers,
--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

2014-02-05 20:43:40

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 105/213] mm: Hold a file reference in madvise_remove

From: Andy Lutomirski <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 9ab4233dd08036fe34a89c7dc6f47a8bf2eb29eb upstream.

Otherwise the code races with munmap (causing a use-after-free
of the vma) or with close (causing a use-after-free of the struct
file).

The bug was introduced by commit 90ed52ebe481 ("[PATCH] holepunch: fix
mmap_sem i_mutex deadlock")

[bwh: Backported to 3.2:
- Adjust context
- madvise_remove() calls vmtruncate_range(), not do_fallocate()]
[luto: Backported to 3.0: Adjust context]

Cc: Hugh Dickins <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Badari Pulavarty <[email protected]>
Cc: Nick Piggin <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[PG: commit e12fcd38abe8a869cbabd77724008f1cf812a3e7 in v3.0.37]
Signed-off-by: Paul Gortmaker <[email protected]>
---
mm/madvise.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 319528b8db74..83aa92aad8bc 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -13,6 +13,7 @@
#include <linux/hugetlb.h>
#include <linux/sched.h>
#include <linux/ksm.h>
+#include <linux/file.h>

/*
* Any behaviour which results in changes to the vma->vm_flags needs to
@@ -191,14 +192,16 @@ static long madvise_remove(struct vm_area_struct *vma,
struct address_space *mapping;
loff_t offset, endoff;
int error;
+ struct file *f;

*prev = NULL; /* tell sys_madvise we drop mmap_sem */

if (vma->vm_flags & (VM_LOCKED|VM_NONLINEAR|VM_HUGETLB))
return -EINVAL;

- if (!vma->vm_file || !vma->vm_file->f_mapping
- || !vma->vm_file->f_mapping->host) {
+ f = vma->vm_file;
+
+ if (!f || !f->f_mapping || !f->f_mapping->host) {
return -EINVAL;
}

@@ -212,9 +215,16 @@ static long madvise_remove(struct vm_area_struct *vma,
endoff = (loff_t)(end - vma->vm_start - 1)
+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);

- /* vmtruncate_range needs to take i_mutex and i_alloc_sem */
+ /*
+ * vmtruncate_range may need to take i_mutex and i_alloc_sem.
+ * We need to explicitly grab a reference because the vma (and
+ * hence the vma's reference to the file) can go away as soon as
+ * we drop mmap_sem.
+ */
+ get_file(f);
up_read(&current->mm->mmap_sem);
error = vmtruncate_range(mapping->host, offset, endoff);
+ fput(f);
down_read(&current->mm->mmap_sem);
return error;
}
--
1.8.5.2

2014-02-05 20:43:37

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 118/213] ACPI / cpuidle: Fix NULL pointer issues when cpuidle is disabled

From: Konrad Rzeszutek Wilk <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit b88a634a903d9670aa5f2f785aa890628ce0dece upstream.

If cpuidle is disabled, that means that:

per_cpu(acpi_cpuidle_device, pr->id)

is set to NULL as the acpi_processor_power_init ends up failing at

retval = cpuidle_register_driver(&acpi_idle_driver)

(in acpi_processor_power_init) and never sets the per_cpu idle
device. So when acpi_processor_hotplug on CPU online notification
tries to reference said device it crashes:

cpu 3 spinlock event irq 62
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffff81381013>] acpi_processor_setup_cpuidle_cx+0x3f/0x105
PGD a259b067 PUD ab38b067 PMD 0
Oops: 0002 [#1] SMP
odules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi libcrc32c crc32c nouveau mxm_wmi wmi radeon ttm sg sr_mod sd_mod cdrom ata_generic ata_piix libata crc32c_intel scsi_mod atl1c i915 fbcon tileblit font bitblit softcursor drm_kms_helper video xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs xen_privcmd mperf
CPU 1
Pid: 3047, comm: bash Not tainted 3.8.0-rc3upstream-00250-g165c029 #1 MSI MS-7680/H61M-P23 (MS-7680)
RIP: e030:[<ffffffff81381013>] [<ffffffff81381013>] acpi_processor_setup_cpuidle_cx+0x3f/0x105
RSP: e02b:ffff88001742dca8 EFLAGS: 00010202
RAX: 0000000000010be9 RBX: ffff8800a0a61800 RCX: ffff880105380000
RDX: 0000000000000003 RSI: 0000000000000200 RDI: ffff8800a0a61800
RBP: ffff88001742dce8 R08: ffffffff81812360 R09: 0000000000000200
R10: aaaaaaaaaaaaaaaa R11: 0000000000000001 R12: ffff8800a0a61800
R13: 00000000ffffff01 R14: 0000000000000000 R15: ffffffff81a907a0
FS: 00007fd6942f7700(0000) GS:ffff880105280000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 00000000a6773000 CR4: 0000000000042660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 3047, threadinfo ffff88001742c000, task ffff880017944000)
Stack:
0000000000000150 ffff880100f59e00 ffff88001742dcd8 ffff8800a0a61800
0000000000000000 00000000ffffff01 0000000000000000 ffffffff81a907a0
ffff88001742dd18 ffffffff813815b1 ffff88001742dd08 ffffffff810ae336
Call Trace:
[<ffffffff813815b1>] acpi_processor_hotplug+0x7c/0x9f
[<ffffffff810ae336>] ? schedule_delayed_work_on+0x16/0x20
[<ffffffff8137ee8f>] acpi_cpu_soft_notify+0x90/0xca
[<ffffffff8166023d>] notifier_call_chain+0x4d/0x70
[<ffffffff810bc369>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff81094a4b>] __cpu_notify+0x1b/0x30
[<ffffffff81652cf7>] _cpu_up+0x103/0x14b
[<ffffffff81652e18>] cpu_up+0xd9/0xec
[<ffffffff8164a254>] store_online+0x94/0xd0
[<ffffffff814122fb>] dev_attr_store+0x1b/0x20
[<ffffffff81216404>] sysfs_write_file+0xf4/0x170

This patch fixes it.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/acpi/processor_idle.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 346b758c47ed..10e6efe75daf 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1064,6 +1064,9 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
return -EINVAL;
}

+ if (!dev)
+ return -EINVAL;
+
dev->cpu = pr->id;
for (i = 0; i < CPUIDLE_STATE_MAX; i++) {
dev->states[i].name[0] = '\0';
--
1.8.5.2

2014-02-05 20:06:23

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 100/213] mm: fix vma_resv_map() NULL pointer

From: Dave Hansen <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 4523e1458566a0e8ecfaff90f380dd23acc44d27 upstream.

hugetlb_reserve_pages() can be used for either normal file-backed
hugetlbfs mappings, or MAP_HUGETLB. In the MAP_HUGETLB, semi-anonymous
mode, there is not a VMA around. The new call to resv_map_put() assumed
that there was, and resulted in a NULL pointer dereference:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: vma_resv_map+0x9/0x30
PGD 141453067 PUD 1421e1067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
...
Pid: 14006, comm: trinity-child6 Not tainted 3.4.0+ #36
RIP: vma_resv_map+0x9/0x30
...
Process trinity-child6 (pid: 14006, threadinfo ffff8801414e0000, task ffff8801414f26b0)
Call Trace:
resv_map_put+0xe/0x40
hugetlb_reserve_pages+0xa6/0x1d0
hugetlb_file_setup+0x102/0x2c0
newseg+0x115/0x360
ipcget+0x1ce/0x310
sys_shmget+0x5a/0x60
system_call_fastpath+0x16/0x1b

This was reported by Dave Jones, but was reproducible with the
libhugetlbfs test cases, so shame on me for not running them in the
first place.

With this, the oops is gone, and the output of libhugetlbfs's
run_tests.py is identical to plain 3.4 again.

[ Marked for stable, since this was introduced by commit c50ac050811d
("hugetlb: fix resv_map leak in error path") which was also marked for
stable ]

Reported-by: Dave Jones <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
mm/hugetlb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f7b80540ba95..c80dd852f3a0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2789,7 +2789,8 @@ int hugetlb_reserve_pages(struct inode *inode,
region_add(&inode->i_mapping->private_list, from, to);
return 0;
out_err:
- resv_map_put(vma);
+ if (vma)
+ resv_map_put(vma);
return ret;
}

--
1.8.5.2

2014-02-05 20:44:31

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 116/213] KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)

From: Andy Honig <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit a2c118bfab8bc6b8bb213abfc35201e441693d55 upstream.

If the guest specifies a IOAPIC_REG_SELECT with an invalid value and follows
that with a read of the IOAPIC_REG_WINDOW KVM does not properly validate
that request. ioapic_read_indirect contains an
ASSERT(redir_index < IOAPIC_NUM_PINS), but the ASSERT has no effect in
non-debug builds. In recent kernels this allows a guest to cause a kernel
oops by reading invalid memory. In older kernels (pre-3.3) this allows a
guest to read from large ranges of host memory.

Tested: tested against apic unit tests.

Signed-off-by: Andrew Honig <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
virt/kvm/ioapic.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 3500dee9cf2b..57afcdaa2863 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -72,9 +72,12 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
u32 redir_index = (ioapic->ioregsel - 0x10) >> 1;
u64 redir_content;

- ASSERT(redir_index < IOAPIC_NUM_PINS);
+ if (redir_index < IOAPIC_NUM_PINS)
+ redir_content =
+ ioapic->redirtbl[redir_index].bits;
+ else
+ redir_content = ~0ULL;

- redir_content = ioapic->redirtbl[redir_index].bits;
result = (ioapic->ioregsel & 0x1) ?
(redir_content >> 32) & 0xffffffff :
redir_content & 0xffffffff;
--
1.8.5.2

2014-02-05 20:44:51

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 112/213] x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates

From: Samu Kallio <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 1160c2779b826c6f5c08e5cc542de58fd1f667d5 upstream.

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

The OOPs usually looks as so:

------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:396!
invalid opcode: 0000 [#1] SMP
.. snip ..
CPU 1
Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
RIP: e030:[<ffffffff816271bf>] [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
.. snip ..
Call Trace:
[<ffffffff81627759>] do_page_fault+0x399/0x4b0
[<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
[<ffffffff81624065>] page_fault+0x25/0x30
[<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
[<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
[<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
[<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
[<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
[<ffffffff81153e61>] unmap_single_vma+0x531/0x870
[<ffffffff81154962>] unmap_vmas+0x52/0xa0
[<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
[<ffffffff8115c8f8>] exit_mmap+0x98/0x170
[<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[<ffffffff81059ce3>] mmput+0x83/0xf0
[<ffffffff810624c4>] exit_mm+0x104/0x130
[<ffffffff8106264a>] do_exit+0x15a/0x8c0
[<ffffffff810630ff>] do_group_exit+0x3f/0xa0
[<ffffffff81063177>] sys_exit_group+0x17/0x20
[<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737
Tested-by: Josh Boyer <[email protected]>
Reported-and-Tested-by: Krishna Raman <[email protected]>
Signed-off-by: Samu Kallio <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Tested-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
arch/x86/mm/fault.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 544ed251a40c..7c96106039a1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -379,10 +379,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
if (pgd_none(*pgd_ref))
return -1;

- if (pgd_none(*pgd))
+ if (pgd_none(*pgd)) {
set_pgd(pgd, *pgd_ref);
- else
+ arch_flush_lazy_mmu_mode();
+ } else {
BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+ }

/*
* Below here mismatches are bugs because these lower tables
--
1.8.5.2

2014-02-05 20:44:49

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 115/213] KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)

From: Andy Honig <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c300aa64ddf57d9c5d9c898a64b36877345dd4a9 upstream.

If the guest sets the GPA of the time_page so that the request to update the
time straddles a page then KVM will write onto an incorrect page. The
write is done byusing kmap atomic to get a pointer to the page for the time
structure and then performing a memcpy to that page starting at an offset
that the guest controls. Well behaved guests always provide a 32-byte aligned
address, however a malicious guest could use this to corrupt host kernel
memory.

Tested: Tested against kvmclock unit test.

Signed-off-by: Andrew Honig <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
arch/x86/kvm/x86.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c1e586d82c1d..65f9c0c45312 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1152,6 +1152,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
/* ...but clean it before doing the actual write */
vcpu->arch.time_offset = data & ~(PAGE_MASK | 1);

+ /* Check that the address is 32-byte aligned. */
+ if (vcpu->arch.time_offset &
+ (sizeof(struct pvclock_vcpu_time_info) - 1))
+ break;
+
vcpu->arch.time_page =
gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);

--
1.8.5.2

2014-02-05 20:45:33

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 109/213] x86/msr: Add capabilities check

From: Alan Cox <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c903f0456bc69176912dee6dd25c6a66ee1aed00 upstream.

At the moment the MSR driver only relies upon file system
checks. This means that anything as root with any capability set
can write to MSRs. Historically that wasn't very interesting but
on modern processors the MSRs are such that writing to them
provides several ways to execute arbitary code in kernel space.
Sample code and documentation on doing this is circulating and
MSR attacks are used on Windows 64bit rootkits already.

In the Linux case you still need to be able to open the device
file so the impact is fairly limited and reduces the security of
some capability and security model based systems down towards
that of a generic "root owns the box" setup.

Therefore they should require CAP_SYS_RAWIO to prevent an
elevation of capabilities. The impact of this is fairly minimal
on most setups because they don't have heavy use of
capabilities. Those using SELinux, SMACK or AppArmor rules might
want to consider if their rulesets on the MSR driver could be
tighter.

Signed-off-by: Alan Cox <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
arch/x86/kernel/msr.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 4d4468e9f47c..56b77410a9fe 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -176,6 +176,9 @@ static int msr_open(struct inode *inode, struct file *file)
unsigned int cpu;
struct cpuinfo_x86 *c;

+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
cpu = iminor(file->f_path.dentry->d_inode);
if (cpu >= nr_cpu_ids || !cpu_online(cpu))
return -ENXIO; /* No such CPU */
--
1.8.5.2

2014-02-05 20:45:37

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 101/213] mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED

From: Christoffer Dall <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit ad4b3fb7ff9940bcdb1e4cd62bd189d10fa636ba upstream.

Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and
(PageHead) is true, for tail pages. If this is indeed the intended
behavior, which I doubt because it breaks cache cleaning on some ARM
systems, then the nomenclature is highly problematic.

This patch makes sure PageHead is only true for head pages and PageTail
is only true for tail pages, and neither is true for non-compound pages.

[ This buglet seems ancient - seems to have been introduced back in Apr
2008 in commit 6a1e7f777f61: "pageflags: convert to the use of new
macros". And the reason nobody noticed is because the PageHead()
tests are almost all about just sanity-checking, and only used on
pages that are actual page heads. The fact that the old code returned
true for tail pages too was thus not really noticeable. - Linus ]

Signed-off-by: Christoffer Dall <[email protected]>
Acked-by: Andrea Arcangeli <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
include/linux/page-flags.h | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5b59f35dcb8f..040ce385f262 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -362,7 +362,7 @@ static inline int PageCompound(struct page *page)
* pages on the LRU and/or pagecache.
*/
TESTPAGEFLAG(Compound, compound)
-__PAGEFLAG(Head, compound)
+__SETPAGEFLAG(Head, compound) __CLEARPAGEFLAG(Head, compound)

/*
* PG_reclaim is used in combination with PG_compound to mark the
@@ -374,8 +374,14 @@ __PAGEFLAG(Head, compound)
* PG_compound & PG_reclaim => Tail page
* PG_compound & ~PG_reclaim => Head page
*/
+#define PG_head_mask ((1L << PG_compound))
#define PG_head_tail_mask ((1L << PG_compound) | (1L << PG_reclaim))

+static inline int PageHead(struct page *page)
+{
+ return ((page->flags & PG_head_tail_mask) == PG_head_mask);
+}
+
static inline int PageTail(struct page *page)
{
return ((page->flags & PG_head_tail_mask) == PG_head_tail_mask);
--
1.8.5.2

2014-02-05 20:45:29

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 111/213] x86/mm: Check if PUD is large when validating a kernel address

From: Mel Gorman <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 0ee364eb316348ddf3e0dfcd986f5f13f528f821 upstream.

A user reported the following oops when a backup process reads
/proc/kcore:

BUG: unable to handle kernel paging request at ffffbb00ff33b000
IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
[...]

Call Trace:
[<ffffffff811b8aaa>] read_kcore+0x17a/0x370
[<ffffffff811ad847>] proc_reg_read+0x77/0xc0
[<ffffffff81151687>] vfs_read+0xc7/0x130
[<ffffffff811517f3>] sys_read+0x53/0xa0
[<ffffffff81449692>] system_call_fastpath+0x16/0x1b

Investigation determined that the bug triggered when reading
system RAM at the 4G mark. On this system, that was the first
address using 1G pages for the virt->phys direct mapping so the
PUD is pointing to a physical address, not a PMD page.

The problem is that the page table walker in kern_addr_valid() is
not checking pud_large() and treats the physical address as if
it was a PMD. If it happens to look like pmd_none then it'll
silently fail, probably returning zeros instead of real data. If
the data happens to look like a present PMD though, it will be
walked resulting in the oops above.

This patch adds the necessary pud_large() check.

Unfortunately the problem was not readily reproducible and now
they are running the backup program without accessing
/proc/kcore so the patch has not been validated but I think it
makes sense.

Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
arch/x86/include/asm/pgtable.h | 5 +++++
arch/x86/mm/init_64.c | 3 +++
2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a34c785c5a63..321d24d6a924 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -132,6 +132,11 @@ static inline unsigned long pmd_pfn(pmd_t pmd)
return (pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT;
}

+static inline unsigned long pud_pfn(pud_t pud)
+{
+ return (pud_val(pud) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
#define pte_page(pte) pfn_to_page(pte_pfn(pte))

static inline int pmd_large(pmd_t pte)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ee41bba315d1..3cd243b8b01d 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -864,6 +864,9 @@ int kern_addr_valid(unsigned long addr)
if (pud_none(*pud))
return 0;

+ if (pud_large(*pud))
+ return pfn_valid(pud_pfn(*pud));
+
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd))
return 0;
--
1.8.5.2

2014-02-05 20:46:19

by Julian Anastasov

[permalink] [raw]
Subject: Re: [v2.6.34-stable 032/213] ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT)


Hello,

On Wed, 5 Feb 2014, Paul Gortmaker wrote:

> From: Mathias Krause <[email protected]>
>
> -------------------
> This is a commit scheduled for the next v2.6.34 longterm release.
> http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
> If you see a problem with using this for longterm, please comment.
> -------------------
>
> commit 2d8a041b7bfe1097af21441cb77d6af95f4f4680 upstream.
>
> If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
> not set, __ip_vs_get_timeouts() does not fully initialize the structure
> that gets copied to userland and that for leaks up to 12 bytes of kernel
> stack. Add an explicit memset(0) before passing the structure to
> __ip_vs_get_timeouts() to avoid the info leak.

I guess, this patch is not needed after commit
"ipvs: initialize returned data in do_ip_vs_get_ctl" because
the memset is already moved into __ip_vs_get_timeouts().

> Signed-off-by: Mathias Krause <[email protected]>
> Cc: Wensong Zhang <[email protected]>
> Cc: Simon Horman <[email protected]>
> Cc: Julian Anastasov <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
> Signed-off-by: Paul Gortmaker <[email protected]>
> ---
> net/netfilter/ipvs/ip_vs_ctl.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index 36dc1d88c2fa..bd9d805a85a6 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -2469,6 +2469,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
> {
> struct ip_vs_timeout_user t;
>
> + memset(&t, 0, sizeof(t));
> __ip_vs_get_timeouts(&t);
> if (copy_to_user(user, &t, sizeof(t)) != 0)
> ret = -EFAULT;
> --
> 1.8.5.2

Regards

--
Julian Anastasov <[email protected]>

2014-02-05 20:46:44

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 108/213] x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.

From: Jan Beulich <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 13d2b4d11d69a92574a55bfd985cfb0ca77aebdc upstream.

This fixes CVE-2013-0228 / XSA-42

Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
in 32bit PV guest can use to crash the > guest with the panic like this:

-------------
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]

Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
EIP is at xen_iret+0x12/0x2b
EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
Stack:
00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
Call Trace:
Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
general protection fault: 0000 [#2]
---[ end trace ab0d29a492dcd330 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1250, comm: r Tainted: G D ---------------
2.6.32-356.el6.i686 #1
Call Trace:
[<c08476df>] ? panic+0x6e/0x122
[<c084b63c>] ? oops_end+0xbc/0xd0
[<c084b260>] ? do_general_protection+0x0/0x210
[<c084a9b7>] ? error_code+0x73/
-------------

Petr says: "
I've analysed the bug and I think that xen_iret() cannot cope with
mangled DS, in this case zeroed out (null selector/descriptor) by either
xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
entry was invalidated by the reproducer. "

Jan took a look at the preliminary patch and came up a fix that solves
this problem:

"This code gets called after all registers other than those handled by
IRET got already restored, hence a null selector in %ds or a non-null
one that got loaded from a code or read-only data descriptor would
cause a kernel mode fault (with the potential of crashing the kernel
as a whole, if panic_on_oops is set)."

The way to fix this is to realize that the we can only relay on the
registers that IRET restores. The two that are guaranteed are the
%cs and %ss as they are always fixed GDT selectors. Also they are
inaccessible from user mode - so they cannot be altered. This is
the approach taken in this patch.

Another alternative option suggested by Jan would be to relay on
the subtle realization that using the %ebp or %esp relative references uses
the %ss segment. In which case we could switch from using %eax to %ebp and
would not need the %ss over-rides. That would also require one extra
instruction to compensate for the one place where the register is used
as scaled index. However Andrew pointed out that is too subtle and if
further work was to be done in this code-path it could escape folks attention
and lead to accidents.

Reviewed-by: Petr Matousek <[email protected]>
Reported-by: Petr Matousek <[email protected]>
Reviewed-by: Andrew Cooper <[email protected]>
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
arch/x86/xen/xen-asm_32.S | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/xen-asm_32.S b/arch/x86/xen/xen-asm_32.S
index b040b0e518ca..7328f71651e1 100644
--- a/arch/x86/xen/xen-asm_32.S
+++ b/arch/x86/xen/xen-asm_32.S
@@ -88,11 +88,11 @@ ENTRY(xen_iret)
*/
#ifdef CONFIG_SMP
GET_THREAD_INFO(%eax)
- movl TI_cpu(%eax), %eax
- movl __per_cpu_offset(,%eax,4), %eax
- mov xen_vcpu(%eax), %eax
+ movl %ss:TI_cpu(%eax), %eax
+ movl %ss:__per_cpu_offset(,%eax,4), %eax
+ mov %ss:xen_vcpu(%eax), %eax
#else
- movl xen_vcpu, %eax
+ movl %ss:xen_vcpu, %eax
#endif

/* check IF state we're restoring */
@@ -105,11 +105,11 @@ ENTRY(xen_iret)
* resuming the code, so we don't have to be worried about
* being preempted to another CPU.
*/
- setz XEN_vcpu_info_mask(%eax)
+ setz %ss:XEN_vcpu_info_mask(%eax)
xen_iret_start_crit:

/* check for unmasked and pending */
- cmpw $0x0001, XEN_vcpu_info_pending(%eax)
+ cmpw $0x0001, %ss:XEN_vcpu_info_pending(%eax)

/*
* If there's something pending, mask events again so we can
@@ -117,7 +117,7 @@ xen_iret_start_crit:
* touch XEN_vcpu_info_mask.
*/
jne 1f
- movb $1, XEN_vcpu_info_mask(%eax)
+ movb $1, %ss:XEN_vcpu_info_mask(%eax)

1: popl %eax

--
1.8.5.2

2014-02-05 20:46:41

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 114/213] xen/bootup: allow {read|write}_cr8 pvops call.

From: Konrad Rzeszutek Wilk <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 1a7bbda5b1ab0e02622761305a32dc38735b90b2 upstream.

We actually do not do anything about it. Just return a default
value of zero and if the kernel tries to write anything but 0
we BUG_ON.

This fixes the case when an user tries to suspend the machine
and it blows up in save_processor_state b/c 'read_cr8' is set
to NULL and we get:

kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100!
invalid opcode: 0000 [#1] SMP
Pid: 2687, comm: init.late Tainted: G O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs
RIP: e030:[<ffffffff814d5f42>] [<ffffffff814d5f42>] save_processor_state+0x212/0x270

.. snip..
Call Trace:
[<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac
[<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150
[<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
arch/x86/xen/enlighten.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 7fa0dc02b939..1b7d744f6f30 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -779,7 +779,16 @@ static void xen_write_cr4(unsigned long cr4)

native_write_cr4(cr4);
}
-
+#ifdef CONFIG_X86_64
+static inline unsigned long xen_read_cr8(void)
+{
+ return 0;
+}
+static inline void xen_write_cr8(unsigned long val)
+{
+ BUG_ON(val);
+}
+#endif
static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
{
int ret;
@@ -945,6 +954,11 @@ static const struct pv_cpu_ops xen_cpu_ops __initdata = {
.read_cr4_safe = native_read_cr4_safe,
.write_cr4 = xen_write_cr4,

+#ifdef CONFIG_X86_64
+ .read_cr8 = xen_read_cr8,
+ .write_cr8 = xen_write_cr8,
+#endif
+
.wbinvd = native_wbinvd,

.read_msr = native_read_msr_safe,
--
1.8.5.2

2014-02-05 20:46:38

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 113/213] xen/bootup: allow read_tscp call for Xen PV guests.

From: Konrad Rzeszutek Wilk <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit cd0608e71e9757f4dae35bcfb4e88f4d1a03a8ab upstream.

The hypervisor will trap it. However without this patch,
we would crash as the .read_tscp is set to NULL. This patch
fixes it and sets it to the native_read_tscp call.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
arch/x86/xen/enlighten.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 25d787c17ad8..7fa0dc02b939 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -952,6 +952,8 @@ static const struct pv_cpu_ops xen_cpu_ops __initdata = {
.read_tsc = native_read_tsc,
.read_pmc = native_read_pmc,

+ .read_tscp = native_read_tscp,
+
.iret = xen_iret,
.irq_enable_sysexit = xen_sysexit,
#ifdef CONFIG_X86_64
--
1.8.5.2

2014-02-05 20:06:18

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 095/213] Fix a dead loop in async_synchronize_full()

From: Li Zhong <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

[commit 45516ddc16abc923104d78bb3eb772ac0a09e33e in v3.0.44 - paulg ]

[Fixed upstream by commits 2955b47d2c1983998a8c5915cb96884e67f7cb53 and
a4683487f90bfe3049686fc5c566bdc1ad03ace6 from Dan Williams, but they are much
more intrusive than this tiny fix, according to Andrew - gregkh]

This patch tries to fix a dead loop in async_synchronize_full(), which
could be seen when preemption is disabled on a single cpu machine.

void async_synchronize_full(void)
{
do {
async_synchronize_cookie(next_cookie);
} while (!list_empty(&async_running) || !
list_empty(&async_pending));
}

async_synchronize_cookie() calls async_synchronize_cookie_domain() with
&async_running as the default domain to synchronize.

However, there might be some works in the async_pending list from other
domains. On a single cpu system, without preemption, there is no chance
for the other works to finish, so async_synchronize_full() enters a dead
loop.

It seems async_synchronize_full() wants to synchronize all entries in
all running lists(domains), so maybe we could just check the entry_count
to know whether all works are finished.

Currently, async_synchronize_cookie_domain() expects a non-NULL running
list ( if NULL, there would be NULL pointer dereference ), so maybe a
NULL pointer could be used as an indication for the functions to
synchronize all works in all domains.

Reported-by: Paul E. McKenney <[email protected]>
Signed-off-by: Li Zhong <[email protected]>
Tested-by: Paul E. McKenney <[email protected]>
Tested-by: Christian Kujau <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Christian Kujau <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Cong Wang <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/async.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/async.c b/kernel/async.c
index 15319d6c18fe..b640ffc3f5a1 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -94,6 +94,13 @@ static async_cookie_t __lowest_in_progress(struct list_head *running)
{
struct async_entry *entry;

+ if (!running) { /* just check the entry count */
+ if (atomic_read(&entry_count))
+ return 0; /* smaller than any cookie */
+ else
+ return next_cookie;
+ }
+
if (!list_empty(running)) {
entry = list_first_entry(running,
struct async_entry, list);
@@ -249,9 +256,7 @@ EXPORT_SYMBOL_GPL(async_schedule_domain);
*/
void async_synchronize_full(void)
{
- do {
- async_synchronize_cookie(next_cookie);
- } while (!list_empty(&async_running) || !list_empty(&async_pending));
+ async_synchronize_cookie_domain(next_cookie, NULL);
}
EXPORT_SYMBOL_GPL(async_synchronize_full);

@@ -271,7 +276,7 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
/**
* async_synchronize_cookie_domain - synchronize asynchronous function calls within a certain domain with cookie checkpointing
* @cookie: async_cookie_t to use as checkpoint
- * @running: running list to synchronize on
+ * @running: running list to synchronize on, NULL indicates all lists
*
* This function waits until all asynchronous function calls for the
* synchronization domain specified by the running list @list submitted
--
1.8.5.2

2014-02-05 20:47:53

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 103/213] mm: fix invalidate_complete_page2() lock ordering

From: Hugh Dickins <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit ec4d9f626d5908b6052c2973f37992f1db52e967 upstream.

In fuzzing with trinity, lockdep protested "possible irq lock inversion
dependency detected" when isolate_lru_page() reenabled interrupts while
still holding the supposedly irq-safe tree_lock:

invalidate_inode_pages2
invalidate_complete_page2
spin_lock_irq(&mapping->tree_lock)
clear_page_mlock
isolate_lru_page
spin_unlock_irq(&zone->lru_lock)

isolate_lru_page() is correct to enable interrupts unconditionally:
invalidate_complete_page2() is incorrect to call clear_page_mlock() while
holding tree_lock, which is supposed to nest inside lru_lock.

Both truncate_complete_page() and invalidate_complete_page() call
clear_page_mlock() before taking tree_lock to remove page from radix_tree.
I guess invalidate_complete_page2() preferred to test PageDirty (again)
under tree_lock before committing to the munlock; but since the page has
already been unmapped, its state is already somewhat inconsistent, and no
worse if clear_page_mlock() moved up.

Reported-by: Sasha Levin <[email protected]>
Deciphered-by: Andrew Morton <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Ying Han <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
mm/truncate.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index f42675a3615d..d0698a15c61f 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -381,11 +381,12 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page)
if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL))
return 0;

+ clear_page_mlock(page);
+
spin_lock_irq(&mapping->tree_lock);
if (PageDirty(page))
goto failed;

- clear_page_mlock(page);
BUG_ON(page_has_private(page));
__remove_from_page_cache(page);
spin_unlock_irq(&mapping->tree_lock);
--
1.8.5.2

2014-02-05 20:48:21

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 110/213] x86, tls: Off by one limit check

From: Dan Carpenter <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 8f0750f19789cf352d7e24a6cc50f2ab1b4f1372 upstream.

These are used as offsets into an array of GDT_ENTRY_TLS_ENTRIES members
so GDT_ENTRY_TLS_ENTRIES is one past the end of the array.

Signed-off-by: Dan Carpenter <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
arch/x86/kernel/tls.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c
index 6bb7b8579e70..bcfec2d23769 100644
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -163,7 +163,7 @@ int regset_tls_get(struct task_struct *target, const struct user_regset *regset,
{
const struct desc_struct *tls;

- if (pos > GDT_ENTRY_TLS_ENTRIES * sizeof(struct user_desc) ||
+ if (pos >= GDT_ENTRY_TLS_ENTRIES * sizeof(struct user_desc) ||
(pos % sizeof(struct user_desc)) != 0 ||
(count % sizeof(struct user_desc)) != 0)
return -EINVAL;
@@ -198,7 +198,7 @@ int regset_tls_set(struct task_struct *target, const struct user_regset *regset,
struct user_desc infobuf[GDT_ENTRY_TLS_ENTRIES];
const struct user_desc *info;

- if (pos > GDT_ENTRY_TLS_ENTRIES * sizeof(struct user_desc) ||
+ if (pos >= GDT_ENTRY_TLS_ENTRIES * sizeof(struct user_desc) ||
(pos % sizeof(struct user_desc)) != 0 ||
(count % sizeof(struct user_desc)) != 0)
return -EINVAL;
--
1.8.5.2

2014-02-05 20:06:15

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 096/213] Prevent interface errors with Seagate FreeAgent GoFlex

From: Daniel J Blueman <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c531077f40abc9f2129c4c83a30b3f8d6ce1c0e7 upstream.

When using my Seagate FreeAgent GoFlex eSATAp external disk enclosure,
interface errors are always seen until 1.5Gbps is negotiated [1]. This
occurs using any disk in the enclosure, and when the disk is connected
directly with a generic passive eSATAp cable, we see stable 3Gbps
operation as expected.

Blacklist 3Gbps mode to avoid dataloss and the ~30s delay bus reset
and renegotiation incurs.

Signed-off-by: Daniel J Blueman <[email protected]>
Signed-off-by: Jeff Garzik <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/ata/libata-core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6e4b795d79e0..ce0ba625bd62 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4407,6 +4407,7 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {

/* Devices which aren't very happy with higher link speeds */
{ "WD My Book", NULL, ATA_HORKAGE_1_5_GBPS, },
+ { "Seagate FreeAgent GoFlex", NULL, ATA_HORKAGE_1_5_GBPS, },

/*
* Devices which choke on SETXFER. Applies only if both the
--
1.8.5.2

2014-02-05 20:06:13

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 082/213] timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE

From: Tirupathi Reddy <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 42a5cf46cd56f46267d2a9fcf2655f4078cd3042 upstream.

An inactive timer's base can refer to a offline cpu's base.

In the current code, cpu_base's lock is blindly reinitialized each
time a CPU is brought up. If a CPU is brought online during the period
that another thread is trying to modify an inactive timer on that CPU
with holding its timer base lock, then the lock will be reinitialized
under its feet. This leads to following SPIN_BUG().

<0> BUG: spinlock already unlocked on CPU#3, kworker/u:3/1466
<0> lock: 0xe3ebe000, .magic: dead4ead, .owner: kworker/u:3/1466, .owner_cpu: 1
<4> [<c0013dc4>] (unwind_backtrace+0x0/0x11c) from [<c026e794>] (do_raw_spin_unlock+0x40/0xcc)
<4> [<c026e794>] (do_raw_spin_unlock+0x40/0xcc) from [<c076c160>] (_raw_spin_unlock+0x8/0x30)
<4> [<c076c160>] (_raw_spin_unlock+0x8/0x30) from [<c009b858>] (mod_timer+0x294/0x310)
<4> [<c009b858>] (mod_timer+0x294/0x310) from [<c00a5e04>] (queue_delayed_work_on+0x104/0x120)
<4> [<c00a5e04>] (queue_delayed_work_on+0x104/0x120) from [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c)
<4> [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c) from [<c04d8780>] (sdhci_disable+0x40/0x48)
<4> [<c04d8780>] (sdhci_disable+0x40/0x48) from [<c04bf300>] (mmc_release_host+0x4c/0xb0)
<4> [<c04bf300>] (mmc_release_host+0x4c/0xb0) from [<c04c7aac>] (mmc_sd_detect+0x90/0xfc)
<4> [<c04c7aac>] (mmc_sd_detect+0x90/0xfc) from [<c04c2504>] (mmc_rescan+0x7c/0x2c4)
<4> [<c04c2504>] (mmc_rescan+0x7c/0x2c4) from [<c00a6a7c>] (process_one_work+0x27c/0x484)
<4> [<c00a6a7c>] (process_one_work+0x27c/0x484) from [<c00a6e94>] (worker_thread+0x210/0x3b0)
<4> [<c00a6e94>] (worker_thread+0x210/0x3b0) from [<c00aad9c>] (kthread+0x80/0x8c)
<4> [<c00aad9c>] (kthread+0x80/0x8c) from [<c000ea80>] (kernel_thread_exit+0x0/0x8)

As an example, this particular crash occurred when CPU #3 is executing
mod_timer() on an inactive timer whose base is refered to offlined CPU
#2. The code locked the timer_base corresponding to CPU #2. Before it
could proceed, CPU #2 came online and reinitialized the spinlock
corresponding to its base. Thus now CPU #3 held a lock which was
reinitialized. When CPU #3 finally ended up unlocking the old cpu_base
corresponding to CPU #2, we hit the above SPIN_BUG().

CPU #0 CPU #3 CPU #2
------ ------- -------
..... ...... <Offline>
mod_timer()
lock_timer_base
spin_lock_irqsave(&base->lock)

cpu_up(2) ..... ......
init_timers_cpu()
.... ..... spin_lock_init(&base->lock)
..... spin_unlock_irqrestore(&base->lock) ......
<spin_bug>

Allocation of per_cpu timer vector bases is done only once under
"tvec_base_done[]" check. In the current code, spinlock_initialization
of base->lock isn't under this check. When a CPU is up each time the
base lock is reinitialized. Move base spinlock initialization under
the check.

Signed-off-by: Tirupathi Reddy <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/timer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index b7e39516bf0b..12aa4bdb150d 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1552,12 +1552,12 @@ static int __cpuinit init_timers_cpu(int cpu)
boot_done = 1;
base = &boot_tvec_bases;
}
+ spin_lock_init(&base->lock);
tvec_base_done[cpu] = 1;
} else {
base = per_cpu(tvec_bases, cpu);
}

- spin_lock_init(&base->lock);

for (j = 0; j < TVN_SIZE; j++) {
INIT_LIST_HEAD(base->tv5.vec + j);
--
1.8.5.2

2014-02-05 20:06:10

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 089/213] wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task

From: Oleg Nesterov <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 9067ac85d533651b98c2ff903182a20cbb361fcb upstream.

wake_up_process() should never wakeup a TASK_STOPPED/TRACED task.
Change it to use TASK_NORMAL and add the WARN_ON().

TASK_ALL has no other users, probably can be killed.

Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
[PG: kernel/sched/core.c --> kernel/sched.c on 2.6.34]
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/sched.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index e24d139ff694..ddcb8aeb2f59 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2492,7 +2492,8 @@ out:
*/
int wake_up_process(struct task_struct *p)
{
- return try_to_wake_up(p, TASK_ALL, 0);
+ WARN_ON(task_is_stopped_or_traced(p));
+ return try_to_wake_up(p, TASK_NORMAL, 0);
}
EXPORT_SYMBOL(wake_up_process);

--
1.8.5.2

2014-02-05 20:49:09

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 106/213] mempolicy: fix a race in shared_policy_replace()

From: Mel Gorman <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit b22d127a39ddd10d93deee3d96e643657ad53a49 upstream.

shared_policy_replace() use of sp_alloc() is unsafe. 1) sp_node cannot
be dereferenced if sp->lock is not held and 2) another thread can modify
sp_node between spin_unlock for allocating a new sp node and next
spin_lock. The bug was introduced before 2.6.12-rc2.

Kosaki's original patch for this problem was to allocate an sp node and
policy within shared_policy_replace and initialise it when the lock is
reacquired. I was not keen on this approach because it partially
duplicates sp_alloc(). As the paths were sp->lock is taken are not that
performance critical this patch converts sp->lock to sp->mutex so it can
sleep when calling sp_alloc().

[[email protected]: Original patch]
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Christoph Lameter <[email protected]>
Cc: Josh Boyer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
include/linux/mempolicy.h | 2 +-
mm/mempolicy.c | 37 ++++++++++++++++---------------------
2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 1cc966cd3e5f..be6db86cc748 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -180,7 +180,7 @@ struct sp_node {

struct shared_policy {
struct rb_root root;
- spinlock_t lock;
+ struct mutex mutex;
};

void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index c7f53b1228b6..ae43da3aff5a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1835,7 +1835,7 @@ int __mpol_equal(struct mempolicy *a, struct mempolicy *b)
*/

/* lookup first element intersecting start-end */
-/* Caller holds sp->lock */
+/* Caller holds sp->mutex */
static struct sp_node *
sp_lookup(struct shared_policy *sp, unsigned long start, unsigned long end)
{
@@ -1899,13 +1899,13 @@ mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx)

if (!sp->root.rb_node)
return NULL;
- spin_lock(&sp->lock);
+ mutex_lock(&sp->mutex);
sn = sp_lookup(sp, idx, idx+1);
if (sn) {
mpol_get(sn->policy);
pol = sn->policy;
}
- spin_unlock(&sp->lock);
+ mutex_unlock(&sp->mutex);
return pol;
}

@@ -1936,10 +1936,10 @@ static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
static int shared_policy_replace(struct shared_policy *sp, unsigned long start,
unsigned long end, struct sp_node *new)
{
- struct sp_node *n, *new2 = NULL;
+ struct sp_node *n;
+ int ret = 0;

-restart:
- spin_lock(&sp->lock);
+ mutex_lock(&sp->mutex);
n = sp_lookup(sp, start, end);
/* Take care of old policies in the same range. */
while (n && n->start < end) {
@@ -1952,16 +1952,14 @@ restart:
} else {
/* Old policy spanning whole new range. */
if (n->end > end) {
+ struct sp_node *new2;
+ new2 = sp_alloc(end, n->end, n->policy);
if (!new2) {
- spin_unlock(&sp->lock);
- new2 = sp_alloc(end, n->end, n->policy);
- if (!new2)
- return -ENOMEM;
- goto restart;
+ ret = -ENOMEM;
+ goto out;
}
n->end = start;
sp_insert(sp, new2);
- new2 = NULL;
break;
} else
n->end = start;
@@ -1972,12 +1970,9 @@ restart:
}
if (new)
sp_insert(sp, new);
- spin_unlock(&sp->lock);
- if (new2) {
- mpol_put(new2->policy);
- kmem_cache_free(sn_cache, new2);
- }
- return 0;
+out:
+ mutex_unlock(&sp->mutex);
+ return ret;
}

/**
@@ -1995,7 +1990,7 @@ void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol)
int ret;

sp->root = RB_ROOT; /* empty tree == default mempolicy */
- spin_lock_init(&sp->lock);
+ mutex_init(&sp->mutex);

if (mpol) {
struct vm_area_struct pvma;
@@ -2063,7 +2058,7 @@ void mpol_free_shared_policy(struct shared_policy *p)

if (!p->root.rb_node)
return;
- spin_lock(&p->lock);
+ mutex_lock(&p->mutex);
next = rb_first(&p->root);
while (next) {
n = rb_entry(next, struct sp_node, nd);
@@ -2072,7 +2067,7 @@ void mpol_free_shared_policy(struct shared_policy *p)
mpol_put(n->policy);
kmem_cache_free(sn_cache, n);
}
- spin_unlock(&p->lock);
+ mutex_unlock(&p->mutex);
}

/* assumes fs == KERNEL_DS */
--
1.8.5.2

2014-02-05 20:49:51

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 079/213] hfsplus: fix potential overflow in hfsplus_file_truncate()

From: Vyacheslav Dubeyko <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 12f267a20aecf8b84a2a9069b9011f1661c779b4 upstream.

Change a u32 to loff_t hfsplus_file_truncate().

Signed-off-by: Vyacheslav Dubeyko <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Hin-Tak Leung <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/hfsplus/extents.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/hfsplus/extents.c b/fs/hfsplus/extents.c
index 0022eec63cda..b3d234e5b498 100644
--- a/fs/hfsplus/extents.c
+++ b/fs/hfsplus/extents.c
@@ -447,7 +447,7 @@ void hfsplus_file_truncate(struct inode *inode)
struct address_space *mapping = inode->i_mapping;
struct page *page;
void *fsdata;
- u32 size = inode->i_size;
+ loff_t size = inode->i_size;
int res;

res = pagecache_write_begin(NULL, mapping, size, 0,
--
1.8.5.2

2014-02-05 20:50:15

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 104/213] mm: mmu_notifier: fix freed page still mapped in secondary MMU

From: Xiao Guangrong <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 3ad3d901bbcfb15a5e4690e55350db0899095a68 upstream.

mmu_notifier_release() is called when the process is exiting. It will
delete all the mmu notifiers. But at this time the page belonging to the
process is still present in page tables and is present on the LRU list, so
this race will happen:

CPU 0 CPU 1
mmu_notifier_release: try_to_unmap:
hlist_del_init_rcu(&mn->hlist);
ptep_clear_flush_notify:
mmu nofifler not found
free page !!!!!!
/*
* At the point, the page has been
* freed, but it is still mapped in
* the secondary MMU.
*/

mn->ops->release(mn, mm);

Then the box is not stable and sometimes we can get this bug:

[ 738.075923] BUG: Bad page state in process migrate-perf pfn:03bec
[ 738.075931] page:ffffea00000efb00 count:0 mapcount:0 mapping: (null) index:0x8076
[ 738.075936] page flags: 0x20000000000014(referenced|dirty)

The same issue is present in mmu_notifier_unregister().

We can call ->release before deleting the notifier to ensure the page has
been unmapped from the secondary MMU before it is freed.

Signed-off-by: Xiao Guangrong <[email protected]>
Cc: Avi Kivity <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
mm/mmu_notifier.c | 45 +++++++++++++++++++++++----------------------
1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 438951d366f2..0b54146f986e 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -33,6 +33,24 @@
void __mmu_notifier_release(struct mm_struct *mm)
{
struct mmu_notifier *mn;
+ struct hlist_node *n;
+
+ /*
+ * RCU here will block mmu_notifier_unregister until
+ * ->release returns.
+ */
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist)
+ /*
+ * if ->release runs before mmu_notifier_unregister it
+ * must be handled as it's the only way for the driver
+ * to flush all existing sptes and stop the driver
+ * from establishing any more sptes before all the
+ * pages in the mm are freed.
+ */
+ if (mn->ops->release)
+ mn->ops->release(mn, mm);
+ rcu_read_unlock();

spin_lock(&mm->mmu_notifier_mm->lock);
while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
@@ -46,23 +64,6 @@ void __mmu_notifier_release(struct mm_struct *mm)
* mmu_notifier_unregister to return.
*/
hlist_del_init_rcu(&mn->hlist);
- /*
- * RCU here will block mmu_notifier_unregister until
- * ->release returns.
- */
- rcu_read_lock();
- spin_unlock(&mm->mmu_notifier_mm->lock);
- /*
- * if ->release runs before mmu_notifier_unregister it
- * must be handled as it's the only way for the driver
- * to flush all existing sptes and stop the driver
- * from establishing any more sptes before all the
- * pages in the mm are freed.
- */
- if (mn->ops->release)
- mn->ops->release(mn, mm);
- rcu_read_unlock();
- spin_lock(&mm->mmu_notifier_mm->lock);
}
spin_unlock(&mm->mmu_notifier_mm->lock);

@@ -264,16 +265,13 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
{
BUG_ON(atomic_read(&mm->mm_count) <= 0);

- spin_lock(&mm->mmu_notifier_mm->lock);
if (!hlist_unhashed(&mn->hlist)) {
- hlist_del_rcu(&mn->hlist);
-
/*
* RCU here will force exit_mmap to wait ->release to finish
* before freeing the pages.
*/
rcu_read_lock();
- spin_unlock(&mm->mmu_notifier_mm->lock);
+
/*
* exit_mmap will block in mmu_notifier_release to
* guarantee ->release is called before freeing the
@@ -282,8 +280,11 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
if (mn->ops->release)
mn->ops->release(mn, mm);
rcu_read_unlock();
- } else
+
+ spin_lock(&mm->mmu_notifier_mm->lock);
+ hlist_del_rcu(&mn->hlist);
spin_unlock(&mm->mmu_notifier_mm->lock);
+ }

/*
* Wait any running method to finish, of course including
--
1.8.5.2

2014-02-05 20:06:07

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 087/213] ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread

From: Oleg Nesterov <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 0666fb51b1483f27506e212cc7f7b2645b5c7acc upstream.

It is not clear why ptrace_resume() does wake_up_process(). Unless the
caller is PTRACE_KILL the tracee should be TASK_TRACED so we can use
wake_up_state(__TASK_TRACED). If sys_ptrace() races with SIGKILL we do
not need the extra and potentionally spurious wakeup.

If the caller is PTRACE_KILL, wake_up_process() is even more wrong.
The tracee can sleep in any state in any place, and if we have a buggy
code which doesn't handle a spurious wakeup correctly PTRACE_KILL can
be used to exploit it. For example:

int main(void)
{
int child, status;

child = fork();
if (!child) {
int ret;

assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);

ret = pause();
printf("pause: %d %m\n", ret);

return 0x23;
}

sleep(1);
assert(ptrace(PTRACE_KILL, child, 0,0) == 0);

assert(child == wait(&status));
printf("wait: %x\n", status);

return 0;
}

prints "pause: -1 Unknown error 514", -ERESTARTNOHAND leaks to the
userland. In this case sys_pause() is buggy as well and should be
fixed.

I do not know what was the original rationality behind PTRACE_KILL.
The man page is simply wrong and afaics it was always wrong. Imho
it should be deprecated, or may be it should do send_sig(SIGKILL)
as Denys suggests, but in any case I do not think that the current
behaviour was intentional.

Note: there is another problem, ptrace_resume() changes ->exit_code
and this can race with SIGKILL too. Eventually we should change ptrace
to not use ->exit_code.

Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/ptrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index b7b491e6c25b..9450ec22e5a6 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -507,7 +507,7 @@ static int ptrace_resume(struct task_struct *child, long request, long data)
}

child->exit_code = data;
- wake_up_process(child);
+ wake_up_state(child, __TASK_TRACED);

return 0;
}
--
1.8.5.2

2014-02-05 20:50:11

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 107/213] ALSA: seq: Fix missing error handling in snd_seq_timer_open()

From: Takashi Iwai <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 66efdc71d95887b652a742a5dae51fa834d71465 upstream.

snd_seq_timer_open() didn't catch the whole error path but let through
if the timer id is a slave. This may lead to Oops by accessing the
uninitialized pointer.

BUG: unable to handle kernel NULL pointer dereference at 00000000000002ae
IP: [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130
PGD 785cd067 PUD 76964067 PMD 0
Oops: 0002 [#4] SMP
CPU 0
Pid: 4288, comm: trinity-child7 Tainted: G D W 3.9.0-rc1+ #100 Bochs Bochs
RIP: 0010:[<ffffffff819b3477>] [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130
RSP: 0018:ffff88006ece7d38 EFLAGS: 00010246
RAX: 0000000000000286 RBX: ffff88007851b400 RCX: 0000000000000000
RDX: 000000000000ffff RSI: ffff88006ece7d58 RDI: ffff88006ece7d38
RBP: ffff88006ece7d98 R08: 000000000000000a R09: 000000000000fffe
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8800792c5400 R14: 0000000000e8f000 R15: 0000000000000007
FS: 00007f7aaa650700(0000) GS:ffff88007f800000(0000) GS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002ae CR3: 000000006efec000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process trinity-child7 (pid: 4288, threadinfo ffff88006ece6000, task ffff880076a8a290)
Stack:
0000000000000286 ffffffff828f2be0 ffff88006ece7d58 ffffffff810f354d
65636e6575716573 2065756575712072 ffff8800792c0030 0000000000000000
ffff88006ece7d98 ffff8800792c5400 ffff88007851b400 ffff8800792c5520
Call Trace:
[<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff819b17e9>] snd_seq_queue_timer_open+0x29/0x70
[<ffffffff819ae01a>] snd_seq_ioctl_set_queue_timer+0xda/0x120
[<ffffffff819acb9b>] snd_seq_do_ioctl+0x9b/0xd0
[<ffffffff819acbe0>] snd_seq_ioctl+0x10/0x20
[<ffffffff811b9542>] do_vfs_ioctl+0x522/0x570
[<ffffffff8130a4b3>] ? file_has_perm+0x83/0xa0
[<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff811b95ed>] sys_ioctl+0x5d/0xa0
[<ffffffff813663fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81faed69>] system_call_fastpath+0x16/0x1b

Reported-and-tested-by: Tommi Rantala <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
sound/core/seq/seq_timer.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/core/seq/seq_timer.c b/sound/core/seq/seq_timer.c
index 160b1bd0cd62..24d44b2f61ac 100644
--- a/sound/core/seq/seq_timer.c
+++ b/sound/core/seq/seq_timer.c
@@ -290,10 +290,10 @@ int snd_seq_timer_open(struct snd_seq_queue *q)
tid.device = SNDRV_TIMER_GLOBAL_SYSTEM;
err = snd_timer_open(&t, str, &tid, q->queue);
}
- if (err < 0) {
- snd_printk(KERN_ERR "seq fatal error: cannot create timer (%i)\n", err);
- return err;
- }
+ }
+ if (err < 0) {
+ snd_printk(KERN_ERR "seq fatal error: cannot create timer (%i)\n", err);
+ return err;
}
t->callback = snd_seq_timer_interrupt;
t->callback_data = q;
--
1.8.5.2

2014-02-05 20:06:05

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 084/213] gen_init_cpio: avoid stack overflow when expanding

From: Kees Cook <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 20f1de659b77364d55d4e7fad2ef657e7730323f upstream.

Fix possible overflow of the buffer used for expanding environment
variables when building file list.

In the extremely unlikely case of an attacker having control over the
environment variables visible to gen_init_cpio, control over the
contents of the file gen_init_cpio parses, and gen_init_cpio was built
without compiler hardening, the attacker can gain arbitrary execution
control via a stack buffer overflow.

$ cat usr/crash.list
file foo ${BIG}${BIG}${BIG}${BIG}${BIG}${BIG} 0755 0 0
$ BIG=$(perl -e 'print "A" x 4096;') ./usr/gen_init_cpio usr/crash.list
*** buffer overflow detected ***: ./usr/gen_init_cpio terminated

This also replaces the space-indenting with tabs.

Patch based on existing fix extracted from grsecurity.

Signed-off-by: Kees Cook <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Brad Spengler <[email protected]>
Cc: PaX Team <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
usr/gen_init_cpio.c | 43 +++++++++++++++++++++++--------------------
1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/usr/gen_init_cpio.c b/usr/gen_init_cpio.c
index b2b3c2d1cf8b..0478657c15bc 100644
--- a/usr/gen_init_cpio.c
+++ b/usr/gen_init_cpio.c
@@ -299,7 +299,7 @@ static int cpio_mkfile(const char *name, const char *location,
int retval;
int rc = -1;
int namesize;
- int i;
+ unsigned int i;

mode |= S_IFREG;

@@ -375,25 +375,28 @@ error:

static char *cpio_replace_env(char *new_location)
{
- char expanded[PATH_MAX + 1];
- char env_var[PATH_MAX + 1];
- char *start;
- char *end;
-
- for (start = NULL; (start = strstr(new_location, "${")); ) {
- end = strchr(start, '}');
- if (start < end) {
- *env_var = *expanded = '\0';
- strncat(env_var, start + 2, end - start - 2);
- strncat(expanded, new_location, start - new_location);
- strncat(expanded, getenv(env_var), PATH_MAX);
- strncat(expanded, end + 1, PATH_MAX);
- strncpy(new_location, expanded, PATH_MAX);
- } else
- break;
- }
-
- return new_location;
+ char expanded[PATH_MAX + 1];
+ char env_var[PATH_MAX + 1];
+ char *start;
+ char *end;
+
+ for (start = NULL; (start = strstr(new_location, "${")); ) {
+ end = strchr(start, '}');
+ if (start < end) {
+ *env_var = *expanded = '\0';
+ strncat(env_var, start + 2, end - start - 2);
+ strncat(expanded, new_location, start - new_location);
+ strncat(expanded, getenv(env_var),
+ PATH_MAX - strlen(expanded));
+ strncat(expanded, end + 1,
+ PATH_MAX - strlen(expanded));
+ strncpy(new_location, expanded, PATH_MAX);
+ new_location[PATH_MAX] = 0;
+ } else
+ break;
+ }
+
+ return new_location;
}


--
1.8.5.2

2014-02-05 20:51:14

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 102/213] mm: bugfix: set current->reclaim_state to NULL while returning from kswapd()

From: Takamori Yamaguchi <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit b0a8cc58e6b9aaae3045752059e5e6260c0b94bc upstream.

In kswapd(), set current->reclaim_state to NULL before returning, as
current->reclaim_state holds reference to variable on kswapd()'s stack.

In rare cases, while returning from kswapd() during memory offlining,
__free_slab() and freepages() can access the dangling pointer of
current->reclaim_state.

Signed-off-by: Takamori Yamaguchi <[email protected]>
Signed-off-by: Aaditya Kumar <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
mm/vmscan.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5c4620600333..365a899bcae0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2341,6 +2341,8 @@ static int kswapd(void *p)
if (!ret)
balance_pgdat(pgdat, order);
}
+
+ current->reclaim_state = NULL;
return 0;
}

--
1.8.5.2

2014-02-05 20:05:58

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 085/213] exec: do not leave bprm->interp on stack

From: Kees Cook <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit b66c5984017533316fd1951770302649baf1aa33 upstream.

If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.

Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching
binfmt modules. Unfortunately, the logic in binfmt_script and
binfmt_misc does not expect to get restarted. They leave bprm->interp
pointing to their local stack. This means on restart bprm->interp is
left pointing into unused stack memory which can then be copied into the
userspace argv areas.

After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules. As
such, we need to protect the changes to interp.

This changes the logic to require allocation for any changes to the
bprm->interp. To avoid adding a new kmalloc to every exec, the default
value is left as-is. Only when passing through binfmt_script or
binfmt_misc does an allocation take place.

For a proof of concept, see DoTest.sh from:

http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/

Signed-off-by: Kees Cook <[email protected]>
Cc: halfdog <[email protected]>
Cc: P J P <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/binfmt_misc.c | 5 ++++-
fs/binfmt_script.c | 4 +++-
fs/exec.c | 15 +++++++++++++++
include/linux/binfmts.h | 1 +
4 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 42b60b04ea06..fb939976d58c 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -176,7 +176,10 @@ static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs)
goto _error;
bprm->argc ++;

- bprm->interp = iname; /* for binfmt_script */
+ /* Update interp in case binfmt_script needs it. */
+ retval = bprm_change_interp(iname, bprm);
+ if (retval < 0)
+ goto _error;

interp_file = open_exec (iname);
retval = PTR_ERR (interp_file);
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index aca9d55afb22..73d51f39c89a 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -81,7 +81,9 @@ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs)
retval = copy_strings_kernel(1, &i_name, bprm);
if (retval) return retval;
bprm->argc++;
- bprm->interp = interp;
+ retval = bprm_change_interp(interp, bprm);
+ if (retval < 0)
+ return retval;

/*
* OK, now restart the process with the interpreter's dentry.
diff --git a/fs/exec.c b/fs/exec.c
index 4afb996086d5..0ee94fe2fe37 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1119,9 +1119,24 @@ void free_bprm(struct linux_binprm *bprm)
mutex_unlock(&current->cred_guard_mutex);
abort_creds(bprm->cred);
}
+ /* If a binfmt changed the interp, free it. */
+ if (bprm->interp != bprm->filename)
+ kfree(bprm->interp);
kfree(bprm);
}

+int bprm_change_interp(char *interp, struct linux_binprm *bprm)
+{
+ /* If a binfmt changed the interp, free it first. */
+ if (bprm->interp != bprm->filename)
+ kfree(bprm->interp);
+ bprm->interp = kstrdup(interp, GFP_KERNEL);
+ if (!bprm->interp)
+ return -ENOMEM;
+ return 0;
+}
+EXPORT_SYMBOL(bprm_change_interp);
+
/*
* install the new credentials for this executable
*/
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 074b620d5d8b..8e0957df83bb 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -131,6 +131,7 @@ extern int setup_arg_pages(struct linux_binprm * bprm,
unsigned long stack_top,
int executable_stack);
extern int bprm_mm_init(struct linux_binprm *bprm);
+extern int bprm_change_interp(char *interp, struct linux_binprm *bprm);
extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm);
extern int prepare_bprm_creds(struct linux_binprm *bprm);
extern void install_exec_creds(struct linux_binprm *bprm);
--
1.8.5.2

2014-02-05 20:51:33

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 097/213] tracing: Don't call page_to_pfn() if page is NULL

From: Wen Congyang <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 85f2a2ef1d0ab99523e0b947a2b723f5650ed6aa upstream.

When allocating memory fails, page is NULL. page_to_pfn() will
cause the kernel panicked if we don't use sparsemem vmemmap.

Link: http://lkml.kernel.org/r/[email protected]

Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
include/trace/events/kmem.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 3adca0ca9dbe..2018784c931a 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -250,7 +250,7 @@ TRACE_EVENT(mm_page_alloc,

TP_printk("page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s",
__entry->page,
- page_to_pfn(__entry->page),
+ __entry->page ? page_to_pfn(__entry->page) : 0,
__entry->order,
__entry->migratetype,
show_gfp_flags(__entry->gfp_flags))
@@ -276,7 +276,7 @@ DECLARE_EVENT_CLASS(mm_page,

TP_printk("page=%p pfn=%lu order=%u migratetype=%d percpu_refill=%d",
__entry->page,
- page_to_pfn(__entry->page),
+ __entry->page ? page_to_pfn(__entry->page) : 0,
__entry->order,
__entry->migratetype,
__entry->order == 0)
--
1.8.5.2

2014-02-05 20:51:39

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 098/213] tracing: Fix double free when function profile init failed

From: Namhyung Kim <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 83e03b3fe4daffdebbb42151d5410d730ae50bd1 upstream.

On the failure path, stat->start and stat->pages will refer same page.
So it'll attempt to free the same page again and get kernel panic.

Link: http://lkml.kernel.org/r/[email protected]

Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/trace/ftrace.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7dd746cc86ff..337f928749c7 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -480,7 +480,6 @@ int ftrace_profile_pages_init(struct ftrace_profile_stat *stat)
free_page(tmp);
}

- free_page((unsigned long)stat->pages);
stat->pages = NULL;
stat->start = NULL;

--
1.8.5.2

2014-02-05 20:51:47

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 092/213] kernel/resource.c: fix stack overflow in __reserve_region_with_split()

From: T Makphaibulchoke <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 4965f5667f36a95b41cda6638875bc992bd7d18b upstream.

Using a recursive call add a non-conflicting region in
__reserve_region_with_split() could result in a stack overflow in the case
that the recursive calls are too deep. Convert the recursive calls to an
iterative loop to avoid the problem.

Tested on a machine containing 135 regions. The kernel no longer panicked
with stack overflow.

Also tested with code arbitrarily adding regions with no conflict,
embedding two consecutive conflicts and embedding two non-consecutive
conflicts.

Signed-off-by: T Makphaibulchoke <[email protected]>
Reviewed-by: Ram Pai <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/resource.c | 50 ++++++++++++++++++++++++++++++++++++++------------
1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 9c358e263534..bb76480a566f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -613,6 +613,7 @@ static void __init __reserve_region_with_split(struct resource *root,
struct resource *parent = root;
struct resource *conflict;
struct resource *res = kzalloc(sizeof(*res), GFP_ATOMIC);
+ struct resource *next_res = NULL;

if (!res)
return;
@@ -622,21 +623,46 @@ static void __init __reserve_region_with_split(struct resource *root,
res->end = end;
res->flags = IORESOURCE_BUSY;

- conflict = __request_resource(parent, res);
- if (!conflict)
- return;
+ while (1) {

- /* failed, split and try again */
- kfree(res);
+ conflict = __request_resource(parent, res);
+ if (!conflict) {
+ if (!next_res)
+ break;
+ res = next_res;
+ next_res = NULL;
+ continue;
+ }

- /* conflict covered whole area */
- if (conflict->start <= start && conflict->end >= end)
- return;
+ /* conflict covered whole area */
+ if (conflict->start <= res->start &&
+ conflict->end >= res->end) {
+ kfree(res);
+ WARN_ON(next_res);
+ break;
+ }
+
+ /* failed, split and try again */
+ if (conflict->start > res->start) {
+ end = res->end;
+ res->end = conflict->start - 1;
+ if (conflict->end < end) {
+ next_res = kzalloc(sizeof(*next_res),
+ GFP_ATOMIC);
+ if (!next_res) {
+ kfree(res);
+ break;
+ }
+ next_res->name = name;
+ next_res->start = conflict->end + 1;
+ next_res->end = end;
+ next_res->flags = IORESOURCE_BUSY;
+ }
+ } else {
+ res->start = conflict->end + 1;
+ }
+ }

- if (conflict->start > start)
- __reserve_region_with_split(root, start, conflict->start-1, name);
- if (conflict->end < end)
- __reserve_region_with_split(root, conflict->end+1, end, name);
}

void __init reserve_region_with_split(struct resource *root,
--
1.8.5.2

2014-02-05 20:51:42

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 099/213] hugetlb: fix resv_map leak in error path

From: Dave Hansen <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c50ac050811d6485616a193eb0f37bfbd191cc89 upstream.

When called for anonymous (non-shared) mappings, hugetlb_reserve_pages()
does a resv_map_alloc(). It depends on code in hugetlbfs's
vm_ops->close() to release that allocation.

However, in the mmap() failure path, we do a plain unmap_region() without
the remove_vma() which actually calls vm_ops->close().

This is a decent fix. This leak could get reintroduced if new code (say,
after hugetlb_reserve_pages() in hugetlbfs_file_mmap()) decides to return
an error. But, I think it would have to unroll the reservation anyway.

Christoph's test case:

http://marc.info/?l=linux-mm&m=133728900729735

This patch applies to 3.4 and later. A version for earlier kernels is at
https://lkml.org/lkml/2012/5/22/418.

Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
Reported-by: Christoph Lameter <[email protected]>
Tested-by: Christoph Lameter <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
mm/hugetlb.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ca9ce49d71e3..f7b80540ba95 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2030,6 +2030,15 @@ static void hugetlb_vm_op_open(struct vm_area_struct *vma)
kref_get(&reservations->refs);
}

+static void resv_map_put(struct vm_area_struct *vma)
+{
+ struct resv_map *reservations = vma_resv_map(vma);
+
+ if (!reservations)
+ return;
+ kref_put(&reservations->refs, resv_map_release);
+}
+
static void hugetlb_vm_op_close(struct vm_area_struct *vma)
{
struct hstate *h = hstate_vma(vma);
@@ -2045,7 +2054,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
reserve = (end - start) -
region_count(&reservations->regions, start, end);

- kref_put(&reservations->refs, resv_map_release);
+ resv_map_put(vma);

if (reserve) {
hugetlb_acct_memory(h, -reserve);
@@ -2744,12 +2753,16 @@ int hugetlb_reserve_pages(struct inode *inode,
set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
}

- if (chg < 0)
- return chg;
+ if (chg < 0) {
+ ret = chg;
+ goto out_err;
+ }

/* There must be enough filesystem quota for the mapping */
- if (hugetlb_get_quota(inode->i_mapping, chg))
- return -ENOSPC;
+ if (hugetlb_get_quota(inode->i_mapping, chg)) {
+ ret = -ENOSPC;
+ goto out_err;
+ }

/*
* Check enough hugepages are available for the reservation.
@@ -2758,7 +2771,7 @@ int hugetlb_reserve_pages(struct inode *inode,
ret = hugetlb_acct_memory(h, chg);
if (ret < 0) {
hugetlb_put_quota(inode->i_mapping, chg);
- return ret;
+ goto out_err;
}

/*
@@ -2775,6 +2788,9 @@ int hugetlb_reserve_pages(struct inode *inode,
if (!vma || vma->vm_flags & VM_MAYSHARE)
region_add(&inode->i_mapping->private_list, from, to);
return 0;
+out_err:
+ resv_map_put(vma);
+ return ret;
}

void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
--
1.8.5.2

2014-02-05 20:53:04

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 090/213] coredump: prevent double-free on an error path in core dumper

From: Denys Vlasenko <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit f34f9d186df35e5c39163444c43b4fc6255e39c5 upstream.

In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate
memory for info->fields, it frees already allocated stuff and returns
error to its caller, fill_note_info. Which in turn returns error to its
caller, elf_core_dump. Which jumps to cleanup label and calls
free_note_info, which will happily try to free all info->fields again.
BOOM.

This is the fix.

Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Denys Vlasenko <[email protected]>
Cc: Venu Byravarasu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/binfmt_elf.c | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index c21da8aebe46..eee4dd5a13cc 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1661,30 +1661,19 @@ static int elf_note_info_init(struct elf_note_info *info)
return 0;
info->psinfo = kmalloc(sizeof(*info->psinfo), GFP_KERNEL);
if (!info->psinfo)
- goto notes_free;
+ return 0;
info->prstatus = kmalloc(sizeof(*info->prstatus), GFP_KERNEL);
if (!info->prstatus)
- goto psinfo_free;
+ return 0;
info->fpu = kmalloc(sizeof(*info->fpu), GFP_KERNEL);
if (!info->fpu)
- goto prstatus_free;
+ return 0;
#ifdef ELF_CORE_COPY_XFPREGS
info->xfpu = kmalloc(sizeof(*info->xfpu), GFP_KERNEL);
if (!info->xfpu)
- goto fpu_free;
+ return 0;
#endif
return 1;
-#ifdef ELF_CORE_COPY_XFPREGS
- fpu_free:
- kfree(info->fpu);
-#endif
- prstatus_free:
- kfree(info->prstatus);
- psinfo_free:
- kfree(info->psinfo);
- notes_free:
- kfree(info->notes);
- return 0;
}

static int fill_note_info(struct elfhdr *elf, int phdrs,
--
1.8.5.2

2014-02-05 20:05:55

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 065/213] drop_monitor: Make updating data->skb smp safe

From: Neil Horman <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 3885ca785a3618593226687ced84f3f336dc3860 upstream.

Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in
its smp protections. Specifically, its possible to replace data->skb while its
being written. This patch corrects that by making data->skb an rcu protected
variable. That will prevent it from being overwritten while a tracepoint is
modifying it.

Signed-off-by: Neil Horman <[email protected]>
Reported-by: Eric Dumazet <[email protected]>
CC: David Miller <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[PG: The "__rcu" sparse addition is only present in 2.6.37+ kernels.]
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/core/drop_monitor.c | 70 ++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 54 insertions(+), 16 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index e596db9fefd0..0b8aba761930 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -46,7 +46,7 @@ static DEFINE_MUTEX(trace_state_mutex);

struct per_cpu_dm_data {
struct work_struct dm_alert_work;
- struct sk_buff *skb;
+ struct sk_buff /* __rcu <-- only 2.6.37+ */ *skb;
atomic_t dm_hit_count;
struct timer_list send_timer;
};
@@ -73,35 +73,58 @@ static int dm_hit_limit = 64;
static int dm_delay = 1;
static unsigned long dm_hw_check_delta = 2*HZ;
static LIST_HEAD(hw_stats_list);
+static int initialized = 0;

static void reset_per_cpu_data(struct per_cpu_dm_data *data)
{
size_t al;
struct net_dm_alert_msg *msg;
struct nlattr *nla;
+ struct sk_buff *skb;
+ struct sk_buff *oskb = rcu_dereference_protected(data->skb, 1);

al = sizeof(struct net_dm_alert_msg);
al += dm_hit_limit * sizeof(struct net_dm_drop_point);
al += sizeof(struct nlattr);

- data->skb = genlmsg_new(al, GFP_KERNEL);
- genlmsg_put(data->skb, 0, 0, &net_drop_monitor_family,
- 0, NET_DM_CMD_ALERT);
- nla = nla_reserve(data->skb, NLA_UNSPEC, sizeof(struct net_dm_alert_msg));
- msg = nla_data(nla);
- memset(msg, 0, al);
- atomic_set(&data->dm_hit_count, dm_hit_limit);
+ skb = genlmsg_new(al, GFP_KERNEL);
+
+ if (skb) {
+ genlmsg_put(skb, 0, 0, &net_drop_monitor_family,
+ 0, NET_DM_CMD_ALERT);
+ nla = nla_reserve(skb, NLA_UNSPEC,
+ sizeof(struct net_dm_alert_msg));
+ msg = nla_data(nla);
+ memset(msg, 0, al);
+ } else if (initialized)
+ schedule_work_on(smp_processor_id(), &data->dm_alert_work);
+
+ /*
+ * Don't need to lock this, since we are guaranteed to only
+ * run this on a single cpu at a time.
+ * Note also that we only update data->skb if the old and new skb
+ * pointers don't match. This ensures that we don't continually call
+ * synchornize_rcu if we repeatedly fail to alloc a new netlink message.
+ */
+ if (skb != oskb) {
+ rcu_assign_pointer(data->skb, skb);
+
+ synchronize_rcu();
+
+ atomic_set(&data->dm_hit_count, dm_hit_limit);
+ }
+
}

static void send_dm_alert(struct work_struct *unused)
{
struct sk_buff *skb;
- struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data);
+ struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data);

/*
* Grab the skb we're about to send
*/
- skb = data->skb;
+ skb = rcu_dereference_protected(data->skb, 1);

/*
* Replace it with a new one
@@ -111,8 +134,10 @@ static void send_dm_alert(struct work_struct *unused)
/*
* Ship it!
*/
- genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL);
+ if (skb)
+ genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL);

+ put_cpu_var(dm_cpu_data);
}

/*
@@ -123,9 +148,11 @@ static void send_dm_alert(struct work_struct *unused)
*/
static void sched_send_work(unsigned long unused)
{
- struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data);
+ struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data);
+
+ schedule_work_on(smp_processor_id(), &data->dm_alert_work);

- schedule_work(&data->dm_alert_work);
+ put_cpu_var(dm_cpu_data);
}

static void trace_drop_common(struct sk_buff *skb, void *location)
@@ -134,9 +161,16 @@ static void trace_drop_common(struct sk_buff *skb, void *location)
struct nlmsghdr *nlh;
struct nlattr *nla;
int i;
- struct per_cpu_dm_data *data = &__get_cpu_var(dm_cpu_data);
+ struct sk_buff *dskb;
+ struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data);


+ rcu_read_lock();
+ dskb = rcu_dereference(data->skb);
+
+ if (!dskb)
+ goto out;
+
if (!atomic_add_unless(&data->dm_hit_count, -1, 0)) {
/*
* we're already at zero, discard this hit
@@ -144,7 +178,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location)
goto out;
}

- nlh = (struct nlmsghdr *)data->skb->data;
+ nlh = (struct nlmsghdr *)dskb->data;
nla = genlmsg_data(nlmsg_data(nlh));
msg = nla_data(nla);
for (i = 0; i < msg->entries; i++) {
@@ -157,7 +191,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location)
/*
* We need to create a new entry
*/
- __nla_reserve_nohdr(data->skb, sizeof(struct net_dm_drop_point));
+ __nla_reserve_nohdr(dskb, sizeof(struct net_dm_drop_point));
nla->nla_len += NLA_ALIGN(sizeof(struct net_dm_drop_point));
memcpy(msg->points[msg->entries].pc, &location, sizeof(void *));
msg->points[msg->entries].count = 1;
@@ -169,6 +203,8 @@ static void trace_drop_common(struct sk_buff *skb, void *location)
}

out:
+ rcu_read_unlock();
+ put_cpu_var(dm_cpu_data);
return;
}

@@ -385,6 +421,8 @@ static int __init init_net_drop_monitor(void)
data->send_timer.function = sched_send_work;
}

+ initialized = 1;
+
goto out;

out_unreg:
--
1.8.5.2

2014-02-05 20:05:49

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 067/213] drop_monitor: dont sleep in atomic context

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit bec4596b4e6770c7037f21f6bd27567b152dc0d6 upstream.

drop_monitor calls several sleeping functions while in atomic context.

BUG: sleeping function called from invalid context at mm/slub.c:943
in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2
Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55
Call Trace:
[<ffffffff810697ca>] __might_sleep+0xca/0xf0
[<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0
[<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130
[<ffffffff815343fb>] __alloc_skb+0x4b/0x230
[<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor]
[<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor]
[<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor]
[<ffffffff810568e0>] process_one_work+0x130/0x4c0
[<ffffffff81058249>] worker_thread+0x159/0x360
[<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240
[<ffffffff8105d403>] kthread+0x93/0xa0
[<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10
[<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80
[<ffffffff816be6d0>] ? gs_change+0xb/0xb

Rework the logic to call the sleeping functions in right context.

Use standard timer/workqueue api to let system chose any cpu to perform
the allocation and netlink send.

Also avoid a loop if reset_per_cpu_data() cannot allocate memory :
use mod_timer() to wait 1/10 second before next try.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Neil Horman <[email protected]>
Reviewed-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[PG: diffstat here is less by one line due to blank line removal]
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/core/drop_monitor.c | 101 ++++++++++++++++--------------------------------
1 file changed, 33 insertions(+), 68 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 58c57fcc67ad..cd3f2b942991 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -33,9 +33,6 @@
#define TRACE_ON 1
#define TRACE_OFF 0

-static void send_dm_alert(struct work_struct *unused);
-
-
/*
* Globals, our netlink socket pointer
* and the work handle that will send up
@@ -45,11 +42,10 @@ static int trace_state = TRACE_OFF;
static DEFINE_MUTEX(trace_state_mutex);

struct per_cpu_dm_data {
- struct work_struct dm_alert_work;
- struct sk_buff /* __rcu <-- only 2.6.37+ */ *skb;
- atomic_t dm_hit_count;
- struct timer_list send_timer;
- int cpu;
+ spinlock_t lock;
+ struct sk_buff *skb;
+ struct work_struct dm_alert_work;
+ struct timer_list send_timer;
};

struct dm_hw_stat_delta {
@@ -75,13 +71,13 @@ static int dm_delay = 1;
static unsigned long dm_hw_check_delta = 2*HZ;
static LIST_HEAD(hw_stats_list);

-static void reset_per_cpu_data(struct per_cpu_dm_data *data)
+static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
{
size_t al;
struct net_dm_alert_msg *msg;
struct nlattr *nla;
struct sk_buff *skb;
- struct sk_buff *oskb = rcu_dereference_protected(data->skb, 1);
+ unsigned long flags;

al = sizeof(struct net_dm_alert_msg);
al += dm_hit_limit * sizeof(struct net_dm_drop_point);
@@ -96,65 +92,40 @@ static void reset_per_cpu_data(struct per_cpu_dm_data *data)
sizeof(struct net_dm_alert_msg));
msg = nla_data(nla);
memset(msg, 0, al);
- } else
- schedule_work_on(data->cpu, &data->dm_alert_work);
-
- /*
- * Don't need to lock this, since we are guaranteed to only
- * run this on a single cpu at a time.
- * Note also that we only update data->skb if the old and new skb
- * pointers don't match. This ensures that we don't continually call
- * synchornize_rcu if we repeatedly fail to alloc a new netlink message.
- */
- if (skb != oskb) {
- rcu_assign_pointer(data->skb, skb);
-
- synchronize_rcu();
-
- atomic_set(&data->dm_hit_count, dm_hit_limit);
+ } else {
+ mod_timer(&data->send_timer, jiffies + HZ / 10);
}

+ spin_lock_irqsave(&data->lock, flags);
+ swap(data->skb, skb);
+ spin_unlock_irqrestore(&data->lock, flags);
+
+ return skb;
}

-static void send_dm_alert(struct work_struct *unused)
+static void send_dm_alert(struct work_struct *work)
{
struct sk_buff *skb;
- struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data);
+ struct per_cpu_dm_data *data;

- WARN_ON_ONCE(data->cpu != smp_processor_id());
+ data = container_of(work, struct per_cpu_dm_data, dm_alert_work);

- /*
- * Grab the skb we're about to send
- */
- skb = rcu_dereference_protected(data->skb, 1);
-
- /*
- * Replace it with a new one
- */
- reset_per_cpu_data(data);
+ skb = reset_per_cpu_data(data);

- /*
- * Ship it!
- */
if (skb)
genlmsg_multicast(skb, 0, NET_DM_GRP_ALERT, GFP_KERNEL);
-
- put_cpu_var(dm_cpu_data);
}

/*
* This is the timer function to delay the sending of an alert
* in the event that more drops will arrive during the
- * hysteresis period. Note that it operates under the timer interrupt
- * so we don't need to disable preemption here
+ * hysteresis period.
*/
-static void sched_send_work(unsigned long unused)
+static void sched_send_work(unsigned long _data)
{
- struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data);
-
- schedule_work_on(smp_processor_id(), &data->dm_alert_work);
+ struct per_cpu_dm_data *data = (struct per_cpu_dm_data *)_data;

- put_cpu_var(dm_cpu_data);
+ schedule_work(&data->dm_alert_work);
}

static void trace_drop_common(struct sk_buff *skb, void *location)
@@ -164,22 +135,17 @@ static void trace_drop_common(struct sk_buff *skb, void *location)
struct nlattr *nla;
int i;
struct sk_buff *dskb;
- struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data);
-
+ struct per_cpu_dm_data *data;
+ unsigned long flags;

- rcu_read_lock();
- dskb = rcu_dereference(data->skb);
+ local_irq_save(flags);
+ data = &__get_cpu_var(dm_cpu_data);
+ spin_lock(&data->lock);
+ dskb = data->skb;

if (!dskb)
goto out;

- if (!atomic_add_unless(&data->dm_hit_count, -1, 0)) {
- /*
- * we're already at zero, discard this hit
- */
- goto out;
- }
-
nlh = (struct nlmsghdr *)dskb->data;
nla = genlmsg_data(nlmsg_data(nlh));
msg = nla_data(nla);
@@ -189,7 +155,8 @@ static void trace_drop_common(struct sk_buff *skb, void *location)
goto out;
}
}
-
+ if (msg->entries == dm_hit_limit)
+ goto out;
/*
* We need to create a new entry
*/
@@ -201,13 +168,11 @@ static void trace_drop_common(struct sk_buff *skb, void *location)

if (!timer_pending(&data->send_timer)) {
data->send_timer.expires = jiffies + dm_delay * HZ;
- add_timer_on(&data->send_timer, smp_processor_id());
+ add_timer(&data->send_timer);
}

out:
- rcu_read_unlock();
- put_cpu_var(dm_cpu_data);
- return;
+ spin_unlock_irqrestore(&data->lock, flags);
}

static void trace_kfree_skb_hit(struct sk_buff *skb, void *location)
@@ -416,11 +381,11 @@ static int __init init_net_drop_monitor(void)

for_each_present_cpu(cpu) {
data = &per_cpu(dm_cpu_data, cpu);
- data->cpu = cpu;
INIT_WORK(&data->dm_alert_work, send_dm_alert);
init_timer(&data->send_timer);
- data->send_timer.data = cpu;
+ data->send_timer.data = (unsigned long)data;
data->send_timer.function = sched_send_work;
+ spin_lock_init(&data->lock);
reset_per_cpu_data(data);
}

--
1.8.5.2

2014-02-05 20:53:47

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 091/213] kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()

From: Shawn Guo <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit f96972f2dc6365421cf2366ebd61ee4cf060c8d5 upstream.

As kernel_power_off() calls disable_nonboot_cpus(), we may also want to
have kernel_restart() call disable_nonboot_cpus(). Doing so can help
machines that require boot cpu be the last alive cpu during reboot to
survive with kernel restart.

This fixes one reboot issue seen on imx6q (Cortex-A9 Quad). The machine
requires that the restart routine be run on the primary cpu rather than
secondary ones. Otherwise, the secondary core running the restart
routine will fail to come to online after reboot.

Signed-off-by: Shawn Guo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/sys.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sys.c b/kernel/sys.c
index 0324c1cd8e7b..006883113861 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -308,6 +308,7 @@ void kernel_restart_prepare(char *cmd)
void kernel_restart(char *cmd)
{
kernel_restart_prepare(cmd);
+ disable_nonboot_cpus();
if (!cmd)
printk(KERN_EMERG "Restarting system.\n");
else
--
1.8.5.2

2014-02-05 20:53:43

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 093/213] Driver core: treat unregistered bus_types as having no devices

From: Bjorn Helgaas <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 4fa3e78be7e985ca814ce2aa0c09cbee404efcf7 upstream.

A bus_type has a list of devices (klist_devices), but the list and the
subsys_private structure that contains it are not initialized until the
bus_type is registered with bus_register().

The panic/reboot path has fixups that look up devices in pci_bus_type. If
we panic before registering pci_bus_type, the bus_type exists but the list
does not, so mach_reboot_fixups() trips over a null pointer and panics
again:

mach_reboot_fixups
pci_get_device
..
bus_find_device(&pci_bus_type, ...)
bus->p is NULL

Joonsoo reported a problem when panicking before PCI was initialized.
I think this patch should be sufficient to replace the patch he posted
here: https://lkml.org/lkml/2012/12/28/75 ("[PATCH] x86, reboot: skip
reboot_fixups in early boot phase")

Reported-by: Joonsoo Kim <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/base/bus.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 12eec3f633b1..1913e74ee582 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -290,7 +290,7 @@ int bus_for_each_dev(struct bus_type *bus, struct device *start,
struct device *dev;
int error = 0;

- if (!bus)
+ if (!bus || !bus->p)
return -EINVAL;

klist_iter_init_node(&bus->p->klist_devices, &i,
@@ -324,7 +324,7 @@ struct device *bus_find_device(struct bus_type *bus,
struct klist_iter i;
struct device *dev;

- if (!bus)
+ if (!bus || !bus->p)
return NULL;

klist_iter_init_node(&bus->p->klist_devices, &i,
--
1.8.5.2

2014-02-05 20:54:26

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 088/213] kernel/signal.c: stop info leak via the tkill and the tgkill syscalls

From: Emese Revfy <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit b9e146d8eb3b9ecae5086d373b50fa0c1f3e7f0f upstream.

This fixes a kernel memory contents leak via the tkill and tgkill syscalls
for compat processes.

This is visible in the siginfo_t->_sifields._rt.si_sigval.sival_ptr field
when handling signals delivered from tkill.

The place of the infoleak:

int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
{
...
put_user_ex(ptr_to_compat(from->si_ptr), &to->si_ptr);
...
}

Signed-off-by: Emese Revfy <[email protected]>
Reviewed-by: PaX Team <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Serge Hallyn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 137a3333b444..06f13229a709 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2358,7 +2358,7 @@ do_send_specific(pid_t tgid, pid_t pid, int sig, struct siginfo *info)

static int do_tkill(pid_t tgid, pid_t pid, int sig)
{
- struct siginfo info;
+ struct siginfo info = {};

info.si_signo = sig;
info.si_errno = 0;
--
1.8.5.2

2014-02-05 20:54:22

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 083/213] tick: Cleanup NOHZ per cpu data on cpu down

From: Thomas Gleixner <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 4b0c0f294f60abcdd20994a8341a95c8ac5eeb96 upstream.

Prarit reported a crash on CPU offline/online. The reason is that on
CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
up. If at cpu online an interrupt happens before the per cpu tick
device is registered the irq_enter() check potentially sees stale data
and dereferences a NULL pointer.

Cleanup the data after the cpu is dead.

Reported-by: Prarit Bhargava <[email protected]>
Cc: Mike Galbraith <[email protected]>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/time/tick-sched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f992762d7f51..25d34e1a7911 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -769,7 +769,7 @@ void tick_cancel_sched_timer(int cpu)
hrtimer_cancel(&ts->sched_timer);
# endif

- ts->nohz_mode = NOHZ_MODE_INACTIVE;
+ memset(ts, 0, sizeof(*ts));
}
#endif

--
1.8.5.2

2014-02-05 20:55:00

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 081/213] posix-cpu-timers: Fix nanosleep task_struct leak

From: Stanislaw Gruszka <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit e6c42c295e071dd74a66b5a9fcf4f44049888ed8 upstream.

The trinity fuzzer triggered a task_struct reference leak via
clock_nanosleep with CPU_TIMERs. do_cpu_nanosleep() calls
posic_cpu_timer_create(), but misses a corresponding
posix_cpu_timer_del() which leads to the task_struct reference leak.

Reported-and-tested-by: Tommi Rantala <[email protected]>
Signed-off-by: Stanislaw Gruszka <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/posix-cpu-timers.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index bc7704b3a443..3d83a1553a41 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -1544,8 +1544,10 @@ static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
while (!signal_pending(current)) {
if (timer.it.cpu.expires.sched == 0) {
/*
- * Our timer fired and was reset.
+ * Our timer fired and was reset, below
+ * deletion can not fail.
*/
+ posix_cpu_timer_del(&timer);
spin_unlock_irq(&timer.it_lock);
return 0;
}
@@ -1563,9 +1565,26 @@ static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
* We were interrupted by a signal.
*/
sample_to_timespec(which_clock, timer.it.cpu.expires, rqtp);
- posix_cpu_timer_set(&timer, 0, &zero_it, it);
+ error = posix_cpu_timer_set(&timer, 0, &zero_it, it);
+ if (!error) {
+ /*
+ * Timer is now unarmed, deletion can not fail.
+ */
+ posix_cpu_timer_del(&timer);
+ }
spin_unlock_irq(&timer.it_lock);

+ while (error == TIMER_RETRY) {
+ /*
+ * We need to handle case when timer was or is in the
+ * middle of firing. In other cases we already freed
+ * resources.
+ */
+ spin_lock_irq(&timer.it_lock);
+ error = posix_cpu_timer_del(&timer);
+ spin_unlock_irq(&timer.it_lock);
+ }
+
if ((it->it_value.tv_sec | it->it_value.tv_nsec) == 0) {
/*
* It actually did fire already.
--
1.8.5.2

2014-02-05 20:54:57

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 094/213] cgroup: remove incorrect dget/dput() pair in cgroup_create_dir()

From: Tejun Heo <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 175431635ec09b1d1bba04979b006b99e8305a83 upstream.

cgroup_create_dir() does weird dancing with dentry refcnt. On
success, it gets and then puts it achieving nothing. On failure, it
puts but there isn't no matching get anywhere leading to the following
oops if cgroup_create_file() fails for whatever reason.

------------[ cut here ]------------
kernel BUG at /work/os/work/fs/dcache.c:552!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU 2
Pid: 697, comm: mkdir Not tainted 3.7.0-rc4-work+ #3 Bochs Bochs
RIP: 0010:[<ffffffff811d9c0c>] [<ffffffff811d9c0c>] dput+0x1dc/0x1e0
RSP: 0018:ffff88001a3ebef8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88000e5b1ef8 RCX: 0000000000000403
RDX: 0000000000000303 RSI: 2000000000000000 RDI: ffff88000e5b1f58
RBP: ffff88001a3ebf18 R08: ffffffff82c76960 R09: 0000000000000001
R10: ffff880015022080 R11: ffd9bed70f48a041 R12: 00000000ffffffea
R13: 0000000000000001 R14: ffff88000e5b1f58 R15: 00007fff57656d60
FS: 00007ff05fcb3800(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004046f0 CR3: 000000001315f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mkdir (pid: 697, threadinfo ffff88001a3ea000, task ffff880015022080)
Stack:
ffff88001a3ebf48 00000000ffffffea 0000000000000001 0000000000000000
ffff88001a3ebf38 ffffffff811cc889 0000000000000001 ffff88000e5b1ef8
ffff88001a3ebf68 ffffffff811d1fc9 ffff8800198d7f18 ffff880019106ef8
Call Trace:
[<ffffffff811cc889>] done_path_create+0x19/0x50
[<ffffffff811d1fc9>] sys_mkdirat+0x59/0x80
[<ffffffff811d2009>] sys_mkdir+0x19/0x20
[<ffffffff81be1e02>] system_call_fastpath+0x16/0x1b
Code: 00 48 8d 90 18 01 00 00 48 89 93 c0 00 00 00 4c 89 a0 18 01 00 00 48 8b 83 a0 00 00 00 83 80 28 01 00 00 01 e8 e6 6f a0 00 eb 92 <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41
RIP [<ffffffff811d9c0c>] dput+0x1dc/0x1e0
RSP <ffff88001a3ebef8>
---[ end trace 1277bcfd9561ddb0 ]---

Fix it by dropping the unnecessary dget/dput() pair.

Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/cgroup.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index bf4f78f026e8..477bb7e17779 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2187,9 +2187,7 @@ static int cgroup_create_dir(struct cgroup *cgrp, struct dentry *dentry,
dentry->d_fsdata = cgrp;
inc_nlink(parent->d_inode);
rcu_assign_pointer(cgrp->dentry, dentry);
- dget(dentry);
}
- dput(dentry);

return error;
}
--
1.8.5.2

2014-02-05 20:55:44

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 086/213] exec: use -ELOOP for max recursion depth

From: Kees Cook <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit d740269867021faf4ce38a449353d2b986c34a67 upstream.

To avoid an explosion of request_module calls on a chain of abusive
scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
as maximum recursion depth is hit, the error will fail all the way back
up the chain, aborting immediately.

This also has the side-effect of stopping the user's shell from attempting
to reexecute the top-level file as a shell script. As seen in the
dash source:

if (cmd != path_bshell && errno == ENOEXEC) {
*argv-- = cmd;
*argv = cmd = path_bshell;
goto repeat;
}

The above logic was designed for running scripts automatically that lacked
the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
things continue to behave as the shell expects.

Additionally, when tracking recursion, the binfmt handlers should not be
involved. The recursion being tracked is the depth of calls through
search_binary_handler(), so that function should be exclusively responsible
for tracking the depth.

Signed-off-by: Kees Cook <[email protected]>
Cc: halfdog <[email protected]>
Cc: P J P <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/binfmt_em86.c | 1 -
fs/binfmt_misc.c | 6 ------
fs/binfmt_script.c | 4 +---
fs/exec.c | 10 +++++-----
include/linux/binfmts.h | 2 --
5 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index b8e8b0acf9bd..4a1b984638a3 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -42,7 +42,6 @@ static int load_em86(struct linux_binprm *bprm,struct pt_regs *regs)
return -ENOEXEC;
}

- bprm->recursion_depth++; /* Well, the bang-shell is implicit... */
allow_write_access(bprm->file);
fput(bprm->file);
bprm->file = NULL;
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index fb939976d58c..258c5ca3f534 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -116,10 +116,6 @@ static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs)
if (!enabled)
goto _ret;

- retval = -ENOEXEC;
- if (bprm->recursion_depth > BINPRM_MAX_RECURSION)
- goto _ret;
-
/* to keep locking time low, we copy the interpreter string */
read_lock(&entries_lock);
fmt = check_file(bprm);
@@ -200,8 +196,6 @@ static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs)
if (retval < 0)
goto _error;

- bprm->recursion_depth++;
-
retval = search_binary_handler (bprm, regs);
if (retval < 0)
goto _error;
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 73d51f39c89a..65a3c1732ced 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -21,15 +21,13 @@ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs)
char interp[BINPRM_BUF_SIZE];
int retval;

- if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') ||
- (bprm->recursion_depth > BINPRM_MAX_RECURSION))
+ if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
return -ENOEXEC;
/*
* This section does the #! interpretation.
* Sorta complicated, but hopefully it will work. -TYT
*/

- bprm->recursion_depth++;
allow_write_access(bprm->file);
fput(bprm->file);
bprm->file = NULL;
diff --git a/fs/exec.c b/fs/exec.c
index 0ee94fe2fe37..aa3d2ec58c97 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1296,6 +1296,10 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
int try,retval;
struct linux_binfmt *fmt;

+ /* This allows 4 levels of binfmt rewrites before failing hard. */
+ if (depth > 5)
+ return -ELOOP;
+
retval = security_bprm_check(bprm);
if (retval)
return retval;
@@ -1314,12 +1318,8 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
if (!try_module_get(fmt->module))
continue;
read_unlock(&binfmt_lock);
+ bprm->recursion_depth = depth + 1;
retval = fn(bprm, regs);
- /*
- * Restore the depth counter to its starting value
- * in this call, so we don't have to rely on every
- * load_binary function to restore it on return.
- */
bprm->recursion_depth = depth;
if (retval >= 0) {
if (depth == 0)
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 8e0957df83bb..d0ddba228449 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -71,8 +71,6 @@ extern struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
#define BINPRM_FLAGS_EXECFD_BIT 1
#define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)

-#define BINPRM_MAX_RECURSION 4
-
/* Function parameter for binfmt->coredump */
struct coredump_params {
long signr;
--
1.8.5.2

2014-02-05 20:56:26

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 073/213] b43legacy: Fix crash on unload when firmware not available

From: Larry Finger <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 2d838bb608e2d1f6cb4280e76748cb812dc822e7 upstream.

When b43legacy is loaded without the firmware being available, a following
unload generates a kernel NULL pointer dereference BUG as follows:

[ 214.330789] BUG: unable to handle kernel NULL pointer dereference at 0000004c
[ 214.330997] IP: [<c104c395>] drain_workqueue+0x15/0x170
[ 214.331179] *pde = 00000000
[ 214.331311] Oops: 0000 [#1] SMP
[ 214.331471] Modules linked in: b43legacy(-) ssb pcmcia mac80211 cfg80211 af_packet mperf arc4 ppdev sr_mod cdrom sg shpchp yenta_socket pcmcia_rsrc pci_hotplug pcmcia_core battery parport_pc parport floppy container ac button edd autofs4 ohci_hcd ehci_hcd usbcore usb_common thermal processor scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh fan thermal_sys hwmon ata_generic pata_ali libata [last unloaded: cfg80211]
[ 214.333421] Pid: 3639, comm: modprobe Not tainted 3.6.0-rc6-wl+ #163 Source Technology VIC 9921/ALI Based Notebook
[ 214.333580] EIP: 0060:[<c104c395>] EFLAGS: 00010246 CPU: 0
[ 214.333687] EIP is at drain_workqueue+0x15/0x170
[ 214.333788] EAX: c162ac40 EBX: cdfb8360 ECX: 0000002a EDX: 00002a2a
[ 214.333890] ESI: 00000000 EDI: 00000000 EBP: cd767e7c ESP: cd767e5c
[ 214.333957] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 214.333957] CR0: 8005003b CR2: 0000004c CR3: 0c96a000 CR4: 00000090
[ 214.333957] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 214.333957] DR6: ffff0ff0 DR7: 00000400
[ 214.333957] Process modprobe (pid: 3639, ti=cd766000 task=cf802e90 task.ti=cd766000)
[ 214.333957] Stack:
[ 214.333957] 00000292 cd767e74 c12c5e09 00000296 00000296 cdfb8360 cdfb9220 00000000
[ 214.333957] cd767e90 c104c4fd cdfb8360 cdfb9220 cd682800 cd767ea4 d0c10184 cd682800
[ 214.333957] cd767ea4 cba31064 cd767eb8 d0867908 cba31064 d087e09c cd96f034 cd767ec4
[ 214.333957] Call Trace:
[ 214.333957] [<c12c5e09>] ? skb_dequeue+0x49/0x60
[ 214.333957] [<c104c4fd>] destroy_workqueue+0xd/0x150
[ 214.333957] [<d0c10184>] ieee80211_unregister_hw+0xc4/0x100 [mac80211]
[ 214.333957] [<d0867908>] b43legacy_remove+0x78/0x80 [b43legacy]
[ 214.333957] [<d083654d>] ssb_device_remove+0x1d/0x30 [ssb]
[ 214.333957] [<c126f15a>] __device_release_driver+0x5a/0xb0
[ 214.333957] [<c126fb07>] driver_detach+0x87/0x90
[ 214.333957] [<c126ef4c>] bus_remove_driver+0x6c/0xe0
[ 214.333957] [<c1270120>] driver_unregister+0x40/0x70
[ 214.333957] [<d083686b>] ssb_driver_unregister+0xb/0x10 [ssb]
[ 214.333957] [<d087c488>] b43legacy_exit+0xd/0xf [b43legacy]
[ 214.333957] [<c1089dde>] sys_delete_module+0x14e/0x2b0
[ 214.333957] [<c110a4a7>] ? vfs_write+0xf7/0x150
[ 214.333957] [<c1240050>] ? tty_write_lock+0x50/0x50
[ 214.333957] [<c110a6f8>] ? sys_write+0x38/0x70
[ 214.333957] [<c1397c55>] syscall_call+0x7/0xb
[ 214.333957] Code: bc 27 00 00 00 00 a1 74 61 56 c1 55 89 e5 e8 a3 fc ff ff 5d c3 90 55 89 e5 57 56 89 c6 53 b8 40 ac 62 c1 83 ec 14 e8 bb b7 34 00 <8b> 46 4c 8d 50 01 85 c0 89 56 4c 75 03 83 0e 40 80 05 40 ac 62
[ 214.333957] EIP: [<c104c395>] drain_workqueue+0x15/0x170 SS:ESP 0068:cd767e5c
[ 214.333957] CR2: 000000000000004c
[ 214.341110] ---[ end trace c7e90ec026d875a6 ]---Index: wireless-testing/drivers/net/wireless/b43legacy/main.c

The problem is fixed by making certain that the ucode pointer is not NULL
before deregistering the driver in mac80211.

Signed-off-by: Larry Finger <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/net/wireless/b43legacy/main.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/wireless/b43legacy/main.c b/drivers/net/wireless/b43legacy/main.c
index bb2dd9329aa0..40112d49ec3e 100644
--- a/drivers/net/wireless/b43legacy/main.c
+++ b/drivers/net/wireless/b43legacy/main.c
@@ -3849,6 +3849,8 @@ static void b43legacy_remove(struct ssb_device *dev)
cancel_work_sync(&wldev->restart_work);

B43legacy_WARN_ON(!wl);
+ if (!wldev->fw.ucode)
+ return; /* NULL if fw never loaded */
if (wl->current_dev == wldev)
ieee80211_unregister_hw(wl->hw);

--
1.8.5.2

2014-02-05 20:05:34

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 078/213] kernel panic when mount NFSv4

From: Trond Myklebust <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit beb0f0a9fba1fa98b378329a9a5b0a73f25097ae upstream.

On Tue, 2010-12-14 at 16:58 +0800, Mi Jinlong wrote:
> Hi,
>
> When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
> at NFS client's __rpc_create_common function.
>
> The panic place is:
> rpc_mkpipe
> __rpc_lookup_create() <=== find pipefile *idmap*
> __rpc_mkpipe() <=== pipefile is *idmap*
> __rpc_create_common()
> ****** BUG_ON(!d_unhashed(dentry)); ****** *panic*
>
> It means that the dentry's d_flags have be set DCACHE_UNHASHED,
> but it should not be set here.
>
> Is someone known this bug? or give me some idea?
>
> A reproduce program is append, but it can't reproduce the bug every time.
> the export is: "/nfsroot *(rw,no_root_squash,fsid=0,insecure)"
>
> And the panic message is append.
>
> ============================================================================
> #!/bin/sh
>
> LOOPTOTAL=768
> LOOPCOUNT=0
> ret=0
>
> while [ $LOOPCOUNT -ne $LOOPTOTAL ]
> do
> ((LOOPCOUNT += 1))
> service nfs restart
> /usr/sbin/rpc.idmapd
> mount -t nfs4 127.0.0.1:/ /mnt|| return 1;
> ls -l /var/lib/nfs/rpc_pipefs/nfs/*/
> umount /mnt
> echo $LOOPCOUNT
> done
>
> ===============================================================================
> Code: af 60 01 00 00 89 fa 89 f0 e8 64 cf 89 f0 e8 5c 7c 64 cf 31 c0 8b 5c 24 10 8b
> 74 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 <0f> 0b eb fc 8b 46 28 c7 44 24 08 20
> de ee f0 c7 44 24 04 56 ea
> EIP:[<f0ee92ea>] __rpc_create_common+0x8a/0xc0 [sunrpc] SS:ESP 0068:eccb5d28
> ---[ end trace 8f5606cd08928ed2]---
> Kernel panic - not syncing: Fatal exception
> Pid:7131, comm: mount.nfs4 Tainted: G D -------------------2.6.32 #1
> Call Trace:
> [<c080ad18>] ? panic+0x42/0xed
> [<c080e42c>] ? oops_end+0xbc/0xd0
> [<c040b090>] ? do_invalid_op+0x0/0x90
> [<c040b10f>] ? do_invalid_op+0x7f/0x90
> [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
> [<f0edc433>] ? rpc_free_task+0x33/0x70[sunrpc]
> [<f0ed6508>] ? prc_call_sync+0x48/0x60[sunrpc]
> [<f0ed656e>] ? rpc_ping+0x4e/0x60[sunrpc]
> [<f0ed6eaf>] ? rpc_create+0x38f/0x4f0[sunrpc]
> [<c080d80b>] ? error_code+0x73/0x78
> [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
> [<c0532bda>] ? d_lookup+0x2a/0x40
> [<f0ee94b1>] ? rpc_mkpipe+0x111/0x1b0[sunrpc]
> [<f10a59f4>] ? nfs_create_rpc_client+0xb4/0xf0[nfs]
> [<f10d6c6d>] ? nfs_fscache_get_client_cookie+0x1d/0x50[nfs]
> [<f10d3fcb>] ? nfs_idmap_new+0x7b/0x140[nfs]
> [<c05e76aa>] ? strlcpy+0x3a/0x60
> [<f10a60ca>] ? nfs4_set_client+0xea/0x2b0[nfs]
> [<f10a6d0c>] ? nfs4_create_server+0xac/0x1b0[nfs]
> [<c04f1400>] ? krealloc+0x40/0x50
> [<f10b0e8b>] ? nfs4_remote_get_sb+0x6b/0x250[nfs]
> [<c04f14ec>] ? kstrdup+0x3c/0x60
> [<c0520739>] ? vfs_kern_mount+0x69/0x170
> [<f10b1a3c>] ? nfs_do_root_mount+0x6c/0xa0[nfs]
> [<f10b1b47>] ? nfs4_try_mount+0x37/0xa0[nfs]
> [<f10afe6d>] ? nfs4_validate_text_mount_data+-x7d/0xf0[nfs]
> [<f10b1c42>] ? nfs4_get_sb+0x92/0x2f0
> [<c0520739>] ? vfs_kern_mount+0x69/0x170
> [<c05366d2>] ? get_fs_type+0x32/0xb0
> [<c052089f>] ? do_kern_mount+0x3f/0xe0
> [<c053954f>] ? do_mount+0x2ef/0x740
> [<c0537740>] ? copy_mount_options+0xb0/0x120
> [<c0539a0e>] ? sys_mount+0x6e/0xa0

Hi,

Does the following patch fix the problem?

Cheers
Trond

--------------------------
SUNRPC: Fix a BUG in __rpc_create_common

From: Trond Myklebust <[email protected]>

Mi Jinlong reports:

When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
at NFS client's __rpc_create_common function.

The panic place is:
rpc_mkpipe
__rpc_lookup_create() <=== find pipefile *idmap*
__rpc_mkpipe() <=== pipefile is *idmap*
__rpc_create_common()
****** BUG_ON(!d_unhashed(dentry)); ****** *panic*

The test is wrong: we can find ourselves with a hashed negative dentry here
if the idmapper tried to look up the file before we got round to creating
it.

Just replace the BUG_ON() with a d_drop(dentry).

Reported-by: Mi Jinlong <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/sunrpc/rpc_pipe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index dbf50f9f7ed2..f4da18e14c0b 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -459,7 +459,7 @@ static int __rpc_create_common(struct inode *dir, struct dentry *dentry,
{
struct inode *inode;

- BUG_ON(!d_unhashed(dentry));
+ d_drop(dentry);
inode = rpc_get_inode(dir->i_sb, mode);
if (!inode)
goto out_err;
--
1.8.5.2

2014-02-05 20:56:46

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 064/213] drop_monitor: fix sleeping in invalid context warning

From: Neil Horman <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit cde2e9a651b76d8db36ae94cd0febc82b637e5dd upstream.

Eric Dumazet pointed out this warning in the drop_monitor protocol to me:

[ 38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85
[ 38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch
[ 38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71
[ 38.352582] Call Trace:
[ 38.352592] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[ 38.352599] [<ffffffff81063f2a>] __might_sleep+0xca/0xf0
[ 38.352606] [<ffffffff81655b16>] mutex_lock+0x26/0x50
[ 38.352610] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[ 38.352616] [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90
[ 38.352621] [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170
[ 38.352625] [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40
[ 38.352630] [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0
[ 38.352636] [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0
[ 38.352640] [<ffffffff8154a600>] ? genl_rcv+0x30/0x30
[ 38.352645] [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0
[ 38.352649] [<ffffffff8154a5f0>] genl_rcv+0x20/0x30
[ 38.352653] [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0
[ 38.352658] [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310
[ 38.352663] [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130
[ 38.352668] [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0
[ 38.352673] [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0
[ 38.352677] [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390
[ 38.352682] [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210
[ 38.352687] [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0
[ 38.352693] [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0
[ 38.352699] [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10
[ 38.352703] [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140
[ 38.352708] [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80
[ 38.352713] [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b

It stems from holding a spinlock (trace_state_lock) while attempting to register
or unregister tracepoint hooks, making in_atomic() true in this context, leading
to the warning when the tracepoint calls might_sleep() while its taking a mutex.
Since we only use the trace_state_lock to prevent trace protocol state races, as
well as hardware stat list updates on an rcu write side, we can just convert the
spinlock to a mutex to avoid this problem.

Signed-off-by: Neil Horman <[email protected]>
Reported-by: Eric Dumazet <[email protected]>
CC: David Miller <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/core/drop_monitor.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index cf208d8042b1..e596db9fefd0 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -42,7 +42,7 @@ static void send_dm_alert(struct work_struct *unused);
* netlink alerts
*/
static int trace_state = TRACE_OFF;
-static DEFINE_SPINLOCK(trace_state_lock);
+static DEFINE_MUTEX(trace_state_mutex);

struct per_cpu_dm_data {
struct work_struct dm_alert_work;
@@ -221,7 +221,7 @@ static int set_all_monitor_traces(int state)
struct dm_hw_stat_delta *new_stat = NULL;
struct dm_hw_stat_delta *temp;

- spin_lock(&trace_state_lock);
+ mutex_lock(&trace_state_mutex);

switch (state) {
case TRACE_ON:
@@ -252,7 +252,7 @@ static int set_all_monitor_traces(int state)
if (!rc)
trace_state = state;

- spin_unlock(&trace_state_lock);
+ mutex_unlock(&trace_state_mutex);

if (rc)
return -EINPROGRESS;
@@ -297,12 +297,12 @@ static int dropmon_net_event(struct notifier_block *ev_block,

new_stat->dev = dev;
new_stat->last_rx = jiffies;
- spin_lock(&trace_state_lock);
+ mutex_lock(&trace_state_mutex);
list_add_rcu(&new_stat->list, &hw_stats_list);
- spin_unlock(&trace_state_lock);
+ mutex_unlock(&trace_state_mutex);
break;
case NETDEV_UNREGISTER:
- spin_lock(&trace_state_lock);
+ mutex_lock(&trace_state_mutex);
list_for_each_entry_safe(new_stat, tmp, &hw_stats_list, list) {
if (new_stat->dev == dev) {
new_stat->dev = NULL;
@@ -313,7 +313,7 @@ static int dropmon_net_event(struct notifier_block *ev_block,
}
}
}
- spin_unlock(&trace_state_lock);
+ mutex_unlock(&trace_state_mutex);
break;
}
out:
--
1.8.5.2

2014-02-05 20:57:11

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 063/213] tcp: do_tcp_sendpages() must try to push data out on oom conditions

From: Willy Tarreau <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit bad115cfe5b509043b684d3a007ab54b80090aa1 upstream.

Since recent changes on TCP splicing (starting with commits 2f533844
"tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
tcp_sendpages() should call tcp_push() once"), I started seeing
massive stalls when forwarding traffic between two sockets using
splice() when pipe buffers were larger than socket buffers.

Latest changes (net: netdev_alloc_skb() use build_skb()) made the
problem even more apparent.

The reason seems to be that if do_tcp_sendpages() fails on out of memory
condition without being able to send at least one byte, tcp_push() is not
called and the buffers cannot be flushed.

After applying the attached patch, I cannot reproduce the stalls at all
and the data rate it perfectly stable and steady under any condition
which previously caused the problem to be permanent.

The issue seems to have been there since before the kernel migrated to
git, which makes me think that the stalls I occasionally experienced
with tux during stress-tests years ago were probably related to the
same issue.

This issue was first encountered on 3.0.31 and 3.2.17, so please backport
to -stable.

Signed-off-by: Willy Tarreau <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/ipv4/tcp.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index df671c76f196..392d594521c1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -839,8 +839,7 @@ new_segment:
wait_for_sndbuf:
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
wait_for_memory:
- if (copied)
- tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
+ tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);

if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
goto do_error;
--
1.8.5.2

2014-02-05 20:05:32

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 066/213] drop_monitor: prevent init path from scheduling on the wrong cpu

From: Neil Horman <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 4fdcfa12843bca38d0c9deff70c8720e4e8f515f upstream.

I just noticed after some recent updates, that the init path for the drop
monitor protocol has a minor error. drop monitor maintains a per cpu structure,
that gets initalized from a single cpu. Normally this is fine, as the protocol
isn't in use yet, but I recently made a change that causes a failed skb
allocation to reschedule itself . Given the current code, the implication is
that this workqueue reschedule will take place on the wrong cpu. If drop
monitor is used early during the boot process, its possible that two cpus will
access a single per-cpu structure in parallel, possibly leading to data
corruption.

This patch fixes the situation, by storing the cpu number that a given instance
of this per-cpu data should be accessed from. In the case of a need for a
reschedule, the cpu stored in the struct is assigned the rescheule, rather than
the currently executing cpu

Tested successfully by myself.

Signed-off-by: Neil Horman <[email protected]>
CC: David Miller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/core/drop_monitor.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 0b8aba761930..58c57fcc67ad 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -49,6 +49,7 @@ struct per_cpu_dm_data {
struct sk_buff /* __rcu <-- only 2.6.37+ */ *skb;
atomic_t dm_hit_count;
struct timer_list send_timer;
+ int cpu;
};

struct dm_hw_stat_delta {
@@ -73,7 +74,6 @@ static int dm_hit_limit = 64;
static int dm_delay = 1;
static unsigned long dm_hw_check_delta = 2*HZ;
static LIST_HEAD(hw_stats_list);
-static int initialized = 0;

static void reset_per_cpu_data(struct per_cpu_dm_data *data)
{
@@ -96,8 +96,8 @@ static void reset_per_cpu_data(struct per_cpu_dm_data *data)
sizeof(struct net_dm_alert_msg));
msg = nla_data(nla);
memset(msg, 0, al);
- } else if (initialized)
- schedule_work_on(smp_processor_id(), &data->dm_alert_work);
+ } else
+ schedule_work_on(data->cpu, &data->dm_alert_work);

/*
* Don't need to lock this, since we are guaranteed to only
@@ -121,6 +121,8 @@ static void send_dm_alert(struct work_struct *unused)
struct sk_buff *skb;
struct per_cpu_dm_data *data = &get_cpu_var(dm_cpu_data);

+ WARN_ON_ONCE(data->cpu != smp_processor_id());
+
/*
* Grab the skb we're about to send
*/
@@ -414,14 +416,14 @@ static int __init init_net_drop_monitor(void)

for_each_present_cpu(cpu) {
data = &per_cpu(dm_cpu_data, cpu);
- reset_per_cpu_data(data);
+ data->cpu = cpu;
INIT_WORK(&data->dm_alert_work, send_dm_alert);
init_timer(&data->send_timer);
data->send_timer.data = cpu;
data->send_timer.function = sched_send_work;
+ reset_per_cpu_data(data);
}

- initialized = 1;

goto out;

--
1.8.5.2

2014-02-05 20:05:22

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 058/213] net/core: Fix potential memory leak in dev_set_alias()

From: Alexey Khoroshilov <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 7364e445f62825758fa61195d237a5b8ecdd06ec upstream.

Do not leak memory by updating pointer with potentially NULL realloc return value.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/core/dev.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 253b409fe3ab..3dbbba137c93 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1034,6 +1034,8 @@ rollback:
*/
int dev_set_alias(struct net_device *dev, const char *alias, size_t len)
{
+ char *new_ifalias;
+
ASSERT_RTNL();

if (len >= IFALIASZ)
@@ -1047,9 +1049,10 @@ int dev_set_alias(struct net_device *dev, const char *alias, size_t len)
return 0;
}

- dev->ifalias = krealloc(dev->ifalias, len + 1, GFP_KERNEL);
- if (!dev->ifalias)
+ new_ifalias = krealloc(dev->ifalias, len + 1, GFP_KERNEL);
+ if (!new_ifalias)
return -ENOMEM;
+ dev->ifalias = new_ifalias;

strlcpy(dev->ifalias, alias, len+1);
return len;
--
1.8.5.2

2014-02-05 20:58:24

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 050/213] af_packet: remove BUG statement in tpacket_destruct_skb

From: "[email protected]" <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 7f5c3e3a80e6654cf48dfba7cf94f88c6b505467 upstream.

Here's a quote of the comment about the BUG macro from asm-generic/bug.h:

Don't use BUG() or BUG_ON() unless there's really no way out; one
example might be detecting data structure corruption in the middle
of an operation that can't be backed out of. If the (sub)system
can somehow continue operating, perhaps with reduced functionality,
it's probably not BUG-worthy.

If you're tempted to BUG(), think again: is completely giving up
really the *only* solution? There are usually better options, where
users don't need to reboot ASAP and can mostly shut down cleanly.

In our case, the status flag of a ring buffer slot is managed from both sides,
the kernel space and the user space. This means that even though the kernel
side might work as expected, the user space screws up and changes this flag
right between the send(2) is triggered when the flag is changed to
TP_STATUS_SENDING and a given skb is destructed after some time. Then, this
will hit the BUG macro. As David suggested, the best solution is to simply
remove this statement since it cannot be used for kernel side internal
consistency checks. I've tested it and the system still behaves /stable/ in
this case, so in accordance with the above comment, we should rather remove it.

Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/packet/af_packet.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 4096a66f6379..dbe4dd160631 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -812,7 +812,6 @@ static void tpacket_destruct_skb(struct sk_buff *skb)

if (likely(po->tx_ring.pg_vec)) {
ph = skb_shinfo(skb)->destructor_arg;
- BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);
BUG_ON(atomic_read(&po->tx_ring.pending) == 0);
atomic_dec(&po->tx_ring.pending);
__packet_set_status(po, ph, TP_STATUS_AVAILABLE);
--
1.8.5.2

2014-02-05 20:58:20

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 076/213] NFSv3: Ensure that do_proc_get_root() reports errors correctly

From: Trond Myklebust <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 086600430493e04b802bee6e5b3ce0458e4eb77f upstream.

If the rpc call to NFS3PROC_FSINFO fails, then we need to report that
error so that the mount fails. Otherwise we can end up with a
superblock with completely unusable values for block sizes, maxfilesize,
etc.

Reported-by: Yuanming Chen <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/nfs/nfs3proc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index e701002694e5..4958497da25b 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -68,7 +68,7 @@ do_proc_get_root(struct rpc_clnt *client, struct nfs_fh *fhandle,
nfs_fattr_init(info->fattr);
status = rpc_call_sync(client, &msg, 0);
dprintk("%s: reply fsinfo: %d\n", __func__, status);
- if (!(info->fattr->valid & NFS_ATTR_FATTR)) {
+ if (status == 0 && !(info->fattr->valid & NFS_ATTR_FATTR)) {
msg.rpc_proc = &nfs3_procedures[NFS3PROC_GETATTR];
msg.rpc_resp = info->fattr;
status = rpc_call_sync(client, &msg, 0);
--
1.8.5.2

2014-02-05 20:58:16

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 075/213] nfsd4: fix oops on unusual readlike compound

From: "J. Bruce Fields" <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit d5f50b0c290431c65377c4afa1c764e2c3fe5305 upstream.

If the argument and reply together exceed the maximum payload size, then
a reply with a read-like operation can overlow the rq_pages array.

Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/nfsd/nfs4xdr.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 28d586a5e77e..3800e7680037 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2607,11 +2607,16 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
len = maxcount;
v = 0;
while (len > 0) {
- pn = resp->rqstp->rq_resused++;
+ pn = resp->rqstp->rq_resused;
+ if (!resp->rqstp->rq_respages[pn]) { /* ran out of pages */
+ maxcount -= len;
+ break;
+ }
resp->rqstp->rq_vec[v].iov_base =
page_address(resp->rqstp->rq_respages[pn]);
resp->rqstp->rq_vec[v].iov_len =
len < PAGE_SIZE ? len : PAGE_SIZE;
+ resp->rqstp->rq_resused++;
v++;
len -= PAGE_SIZE;
}
@@ -2659,6 +2664,8 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
return nfserr;
if (resp->xbuf->page_len)
return nfserr_resource;
+ if (!resp->rqstp->rq_respages[resp->rqstp->rq_resused])
+ return nfserr_resource;

page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused++]);

@@ -2708,6 +2715,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
return nfserr;
if (resp->xbuf->page_len)
return nfserr_resource;
+ if (!resp->rqstp->rq_respages[resp->rqstp->rq_resused])
+ return nfserr_resource;

RESERVE_SPACE(8); /* verifier */
savep = p;
--
1.8.5.2

2014-02-05 20:59:29

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 072/213] xfrm_user: return error pointer instead of NULL #2

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c25463722509fef0ed630b271576a8c9a70236f3 upstream.

When dump_one_policy() returns an error, e.g. because of a too small
buffer to dump the whole xfrm policy, xfrm_policy_netlink() returns
NULL instead of an error pointer. But its caller expects an error
pointer and therefore continues to operate on a NULL skbuff.

Signed-off-by: Mathias Krause <[email protected]>
Acked-by: Steffen Klassert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/xfrm/xfrm_user.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index bbc09e0c1ae7..88c6c8d59d79 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1435,6 +1435,7 @@ static struct sk_buff *xfrm_policy_netlink(struct sk_buff *in_skb,
{
struct xfrm_dump_info info;
struct sk_buff *skb;
+ int err;

skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
if (!skb)
@@ -1445,9 +1446,10 @@ static struct sk_buff *xfrm_policy_netlink(struct sk_buff *in_skb,
info.nlmsg_seq = seq;
info.nlmsg_flags = 0;

- if (dump_one_policy(xp, dir, 0, &info) < 0) {
+ err = dump_one_policy(xp, dir, 0, &info);
+ if (err) {
kfree_skb(skb);
- return NULL;
+ return ERR_PTR(err);
}

return skb;
--
1.8.5.2

2014-02-05 20:59:39

by Julian Anastasov

[permalink] [raw]
Subject: Re: [v2.6.34-stable 032/213] ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT)


Hello,

On Wed, 5 Feb 2014, Julian Anastasov wrote:

> On Wed, 5 Feb 2014, Paul Gortmaker wrote:
>
> > From: Mathias Krause <[email protected]>
> >
> > -------------------
> > This is a commit scheduled for the next v2.6.34 longterm release.
> > http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
> > If you see a problem with using this for longterm, please comment.
> > -------------------
> >
> > commit 2d8a041b7bfe1097af21441cb77d6af95f4f4680 upstream.
> >
> > If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
> > not set, __ip_vs_get_timeouts() does not fully initialize the structure
> > that gets copied to userland and that for leaks up to 12 bytes of kernel
> > stack. Add an explicit memset(0) before passing the structure to
> > __ip_vs_get_timeouts() to avoid the info leak.
>
> I guess, this patch is not needed after commit
> "ipvs: initialize returned data in do_ip_vs_get_ctl" because
> the memset is already moved into __ip_vs_get_timeouts().

Sorry, I thought this patch comes after commit
b61a602ee6730150f4d0df730d9312ac4d820ceb
("ipvs: initialize returned data in do_ip_vs_get_ctl")
but it looks like it is ok with this single change,
sorry for the noise!

Regards

--
Julian Anastasov <[email protected]>

2014-02-05 21:00:07

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 070/213] xfrm_user: fix info leak in copy_to_user_tmpl()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 1f86840f897717f86d523a13e99a447e6a5d2fa5 upstream.

The memory used for the template copy is a local stack variable. As
struct xfrm_user_tmpl contains multiple holes added by the compiler for
alignment, not initializing the memory will lead to leaking stack bytes
to userland. Add an explicit memset(0) to avoid the info leak.

Initial version of the patch by Brad Spengler.

Cc: Brad Spengler <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
Acked-by: Steffen Klassert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/xfrm/xfrm_user.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2f72480e6b8d..44d18db58cbb 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1304,6 +1304,7 @@ static int copy_to_user_tmpl(struct xfrm_policy *xp, struct sk_buff *skb)
struct xfrm_user_tmpl *up = &vec[i];
struct xfrm_tmpl *kp = &xp->xfrm_vec[i];

+ memset(up, 0, sizeof(*up));
memcpy(&up->id, &kp->id, sizeof(up->id));
up->family = kp->encap_family;
memcpy(&up->saddr, &kp->saddr, sizeof(up->saddr));
--
1.8.5.2

2014-02-05 20:05:19

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 057/213] net: reduce net_rx_action() latency to 2 HZ

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit d1f41b67ff7735193bc8b418b98ac99a448833e2 upstream.

We should use time_after_eq() to get maximum latency of two ticks,
instead of three.

Bug added in commit 24f8b2385 (net: increase receive packet quantum)

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/core/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 2aaf2e610f92..253b409fe3ab 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3045,7 +3045,7 @@ static void net_rx_action(struct softirq_action *h)
* Allow this to run for 2 jiffies since which will allow
* an average latency of 1.5/HZ.
*/
- if (unlikely(budget <= 0 || time_after(jiffies, time_limit)))
+ if (unlikely(budget <= 0 || time_after_eq(jiffies, time_limit)))
goto softnet_break;

local_irq_enable();
--
1.8.5.2

2014-02-05 21:00:29

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 071/213] xfrm_user: return error pointer instead of NULL

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 864745d291b5ba80ea0bd0edcbe67273de368836 upstream.

When dump_one_state() returns an error, e.g. because of a too small
buffer to dump the whole xfrm state, xfrm_state_netlink() returns NULL
instead of an error pointer. But its callers expect an error pointer
and therefore continue to operate on a NULL skbuff.

This could lead to a privilege escalation (execution of user code in
kernel context) if the attacker has CAP_NET_ADMIN and is able to map
address 0.

Signed-off-by: Mathias Krause <[email protected]>
Acked-by: Steffen Klassert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/xfrm/xfrm_user.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 44d18db58cbb..bbc09e0c1ae7 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -765,6 +765,7 @@ static struct sk_buff *xfrm_state_netlink(struct sk_buff *in_skb,
{
struct xfrm_dump_info info;
struct sk_buff *skb;
+ int err;

skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
if (!skb)
@@ -775,9 +776,10 @@ static struct sk_buff *xfrm_state_netlink(struct sk_buff *in_skb,
info.nlmsg_seq = seq;
info.nlmsg_flags = 0;

- if (dump_one_state(x, 0, &info)) {
+ err = dump_one_state(x, 0, &info);
+ if (err) {
kfree_skb(skb);
- return NULL;
+ return ERR_PTR(err);
}

return skb;
--
1.8.5.2

2014-02-05 21:00:54

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 069/213] xfrm_user: fix info leak in copy_to_user_policy()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 7b789836f434c87168eab067cfbed1ec4783dffd upstream.

The memory reserved to dump the xfrm policy includes multiple padding
bytes added by the compiler for alignment (padding bytes in struct
xfrm_selector and struct xfrm_userpolicy_info). Add an explicit
memset(0) before filling the buffer to avoid the heap info leak.

Signed-off-by: Mathias Krause <[email protected]>
Acked-by: Steffen Klassert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/xfrm/xfrm_user.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index a0039040aba6..2f72480e6b8d 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1200,6 +1200,7 @@ static void copy_from_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy

static void copy_to_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy_info *p, int dir)
{
+ memset(p, 0, sizeof(*p));
memcpy(&p->sel, &xp->selector, sizeof(p->sel));
memcpy(&p->lft, &xp->lft, sizeof(p->lft));
memcpy(&p->curlft, &xp->curlft, sizeof(p->curlft));
--
1.8.5.2

2014-02-05 21:01:17

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 068/213] xfrm_user: fix info leak in copy_to_user_state()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit f778a636713a435d3a922c60b1622a91136560c1 upstream.

The memory reserved to dump the xfrm state includes the padding bytes of
struct xfrm_usersa_info added by the compiler for alignment (7 for
amd64, 3 for i386). Add an explicit memset(0) before filling the buffer
to avoid the info leak.

Signed-off-by: Mathias Krause <[email protected]>
Acked-by: Steffen Klassert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/xfrm/xfrm_user.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 6106b72826d3..a0039040aba6 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -598,6 +598,7 @@ out:

static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p)
{
+ memset(p, 0, sizeof(*p));
memcpy(&p->id, &x->id, sizeof(p->id));
memcpy(&p->sel, &x->sel, sizeof(p->sel));
memcpy(&p->lft, &x->lft, sizeof(p->lft));
--
1.8.5.2

2014-02-05 20:05:15

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 055/213] netlink: fix races after skb queueing

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 4a7e7c2ad540e54c75489a70137bf0ec15d3a127 upstream.

As soon as an skb is queued into socket receive_queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/netlink/af_netlink.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ba9f6129ff32..1c889685d575 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -830,12 +830,19 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
return 0;
}

-int netlink_sendskb(struct sock *sk, struct sk_buff *skb)
+static int __netlink_sendskb(struct sock *sk, struct sk_buff *skb)
{
int len = skb->len;

skb_queue_tail(&sk->sk_receive_queue, skb);
sk->sk_data_ready(sk, len);
+ return len;
+}
+
+int netlink_sendskb(struct sock *sk, struct sk_buff *skb)
+{
+ int len = __netlink_sendskb(sk, skb);
+
sock_put(sk);
return len;
}
@@ -960,8 +967,7 @@ static inline int netlink_broadcast_deliver(struct sock *sk,
if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
!test_bit(0, &nlk->state)) {
skb_set_owner_r(skb, sk);
- skb_queue_tail(&sk->sk_receive_queue, skb);
- sk->sk_data_ready(sk, skb->len);
+ __netlink_sendskb(sk, skb);
return atomic_read(&sk->sk_rmem_alloc) > (sk->sk_rcvbuf >> 1);
}
return -1;
@@ -1685,10 +1691,8 @@ static int netlink_dump(struct sock *sk)

if (sk_filter(sk, skb))
kfree_skb(skb);
- else {
- skb_queue_tail(&sk->sk_receive_queue, skb);
- sk->sk_data_ready(sk, skb->len);
- }
+ else
+ __netlink_sendskb(sk, skb);
return 0;
}

@@ -1700,10 +1704,8 @@ static int netlink_dump(struct sock *sk)

if (sk_filter(sk, skb))
kfree_skb(skb);
- else {
- skb_queue_tail(&sk->sk_receive_queue, skb);
- sk->sk_data_ready(sk, skb->len);
- }
+ else
+ __netlink_sendskb(sk, skb);

if (cb->done)
cb->done(cb);
--
1.8.5.2

2014-02-05 21:02:04

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 061/213] tcp: perform DMA to userspace only if there is a task waiting for it

From: Jiri Kosina <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 59ea33a68a9083ac98515e4861c00e71efdc49a1 upstream.

Back in 2006, commit 1a2449a87b ("[I/OAT]: TCP recv offload to I/OAT")
added support for receive offloading to IOAT dma engine if available.

The code in tcp_rcv_established() tries to perform early DMA copy if
applicable. It however does so without checking whether the userspace
task is actually expecting the data in the buffer.

This is not a problem under normal circumstances, but there is a corner
case where this doesn't work -- and that's when MSG_TRUNC flag to
recvmsg() is used.

If the IOAT dma engine is not used, the code properly checks whether
there is a valid ucopy.task and the socket is owned by userspace, but
misses the check in the dmaengine case.

This problem can be observed in real trivially -- for example 'tbench' is a
good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
have been already early-copied in tcp_rcv_established() using dma engine.

This patch introduces the same check we are performing in the simple
iovec copy case to the IOAT case as well. It fixes the indefinite
recvmsg(MSG_TRUNC) hangs.

Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/ipv4/tcp_input.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index add69a11754d..df5de7dd22d2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5315,7 +5315,9 @@ int tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
if (tp->copied_seq == tp->rcv_nxt &&
len - tcp_header_len <= tp->ucopy.len) {
#ifdef CONFIG_NET_DMA
- if (tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
+ if (tp->ucopy.task == current &&
+ sock_owned_by_user(sk) &&
+ tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
copied_early = 1;
eaten = 1;
}
--
1.8.5.2

2014-02-05 21:02:28

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 062/213] tcp: drop SYN+FIN messages

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit fdf5af0daf8019cec2396cdef8fb042d80fe71fa upstream.

Denys Fedoryshchenko reported that SYN+FIN attacks were bringing his
linux machines to their limits.

Dont call conn_request() if the TCP flags includes SYN flag

Reported-by: Denys Fedoryshchenko <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/ipv4/tcp_input.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index df5de7dd22d2..20af12ae55a7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5737,6 +5737,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
goto discard;

if (th->syn) {
+ if (th->fin)
+ goto discard;
if (icsk->icsk_af_ops->conn_request(sk, skb) < 0)
return 1;

--
1.8.5.2

2014-02-05 20:05:10

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 047/213] net: sched: integer overflow fix

From: Stefan Hasko <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit d2fe85da52e89b8012ffad010ef352a964725d5f upstream.

Fixed integer overflow in function htb_dequeue

Signed-off-by: Stefan Hasko <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/sched/sch_htb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 0b52b8de562c..efabd30425ec 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -866,7 +866,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
q->now = psched_get_time();
start_at = jiffies;

- next_event = q->now + 5 * PSCHED_TICKS_PER_SEC;
+ next_event = q->now + 5LLU * PSCHED_TICKS_PER_SEC;

for (level = 0; level < TC_HTB_MAXDEPTH; level++) {
/* common case optimization - skip event handler quickly */
--
1.8.5.2

2014-02-05 21:02:55

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 059/213] net/tun: fix ioctl() based info leaks

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit a117dacde0288f3ec60b6e5bcedae8fa37ee0dfc upstream.

The tun module leaks up to 36 bytes of memory by not fully initializing
a structure located on the stack that gets copied to user memory by the
TUNGETIFF and SIOCGIFHWADDR ioctl()s.

Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/net/tun.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7b4a88b2f696..c777d8ebdaa8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1177,9 +1177,11 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
int sndbuf;
int ret;

- if (cmd == TUNSETIFF || _IOC_TYPE(cmd) == 0x89)
+ if (cmd == TUNSETIFF || _IOC_TYPE(cmd) == 0x89) {
if (copy_from_user(&ifr, argp, ifreq_len))
return -EFAULT;
+ } else
+ memset(&ifr, 0, sizeof(ifr));

if (cmd == TUNGETFEATURES) {
/* Currently this just means: "what IFF flags are valid?".
--
1.8.5.2

2014-02-05 21:03:18

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 060/213] tun: Fix formatting.

From: "David S. Miller" <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 8bbb181308bc348e02bfdbebdedd4e4ec9d452ce upstream.

Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/net/tun.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index c777d8ebdaa8..bb181e92ad2c 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1180,9 +1180,9 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
if (cmd == TUNSETIFF || _IOC_TYPE(cmd) == 0x89) {
if (copy_from_user(&ifr, argp, ifreq_len))
return -EFAULT;
- } else
+ } else {
memset(&ifr, 0, sizeof(ifr));
-
+ }
if (cmd == TUNGETFEATURES) {
/* Currently this just means: "what IFF flags are valid?".
* This is needed because we never checked for invalid flags on
--
1.8.5.2

2014-02-05 21:03:17

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 049/213] bridge: set priority of STP packets

From: Stephen Hemminger <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 547b4e718115eea74087e28d7fa70aec619200db upstream.

Spanning Tree Protocol packets should have always been marked as
control packets, this causes them to get queued in the high prirority
FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge
gets overloaded and can't communicate. This is a long-standing bug back
to the first versions of Linux bridge.

Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/bridge/br_stp_bpdu.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/bridge/br_stp_bpdu.c b/net/bridge/br_stp_bpdu.c
index edc7111b3db8..9fd76244adf6 100644
--- a/net/bridge/br_stp_bpdu.c
+++ b/net/bridge/br_stp_bpdu.c
@@ -16,6 +16,7 @@
#include <linux/etherdevice.h>
#include <linux/llc.h>
#include <linux/slab.h>
+#include <linux/pkt_sched.h>
#include <net/net_namespace.h>
#include <net/llc.h>
#include <net/llc_pdu.h>
@@ -40,6 +41,7 @@ static void br_send_bpdu(struct net_bridge_port *p,

skb->dev = p->dev;
skb->protocol = htons(ETH_P_802_2);
+ skb->priority = TC_PRIO_CONTROL;

skb_reserve(skb, LLC_RESERVE);
memcpy(__skb_put(skb, length), data, length);
--
1.8.5.2

2014-02-05 21:03:15

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 056/213] softirq: reduce latencies

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c10d73671ad30f54692f7f69f0e09e75d3a8926a upstream.

In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.

This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.

This patch changes the fallback to ksoftirqd condition to :

- A time limit of 2 ms.
- need_resched() being set on current task

When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.

Using need_resched() as the only condition can trigger RCU stalls,
as we can keep BH disabled for too long.

I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of latencies (one order
of magnitude) :

In following bench, 200 antagonist "netperf -t TCP_RR" are started in
background, using all available cpus.

Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
IRQ (hard+soft)

Before patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92

After patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26

Signed-off-by: Eric Dumazet <[email protected]>
Cc: David Miller <[email protected]>
Cc: Tom Herbert <[email protected]>
Cc: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
kernel/softirq.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 7c1a67ef0274..0df9a9406271 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -178,21 +178,21 @@ void local_bh_enable_ip(unsigned long ip)
EXPORT_SYMBOL(local_bh_enable_ip);

/*
- * We restart softirq processing MAX_SOFTIRQ_RESTART times,
- * and we fall back to softirqd after that.
+ * We restart softirq processing for at most 2 ms,
+ * and if need_resched() is not set.
*
- * This number has been established via experimentation.
+ * These limits have been established via experimentation.
* The two things to balance is latency against fairness -
* we want to handle softirqs as soon as possible, but they
* should not be able to lock up the box.
*/
-#define MAX_SOFTIRQ_RESTART 10
+#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)

asmlinkage void __do_softirq(void)
{
struct softirq_action *h;
__u32 pending;
- int max_restart = MAX_SOFTIRQ_RESTART;
+ unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
int cpu;

pending = local_softirq_pending();
@@ -236,11 +236,12 @@ restart:
local_irq_disable();

pending = local_softirq_pending();
- if (pending && --max_restart)
- goto restart;
+ if (pending) {
+ if (time_before(jiffies, end) && !need_resched())
+ goto restart;

- if (pending)
wakeup_softirqd();
+ }

lockdep_softirq_exit();

--
1.8.5.2

2014-02-05 20:05:02

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 046/213] net: prevent setting ttl=0 via IP_TTL

From: Cong Wang <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c9be4a5c49cf51cc70a993f004c5bb30067a65ce upstream.

A regression is introduced by the following commit:

commit 4d52cfbef6266092d535237ba5a4b981458ab171
Author: Eric Dumazet <[email protected]>
Date: Tue Jun 2 00:42:16 2009 -0700

net: ipv4/ip_sockglue.c cleanups

Pure cleanups

but it is not a pure cleanup...

- if (val != -1 && (val < 1 || val>255))
+ if (val != -1 && (val < 0 || val > 255))

Since there is no reason provided to allow ttl=0, change it back.

Reported-by: nitin padalia <[email protected]>
Cc: nitin padalia <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: David S. Miller <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/ipv4/ip_sockglue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index e4256fe59a30..3ca6fa7d1365 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -573,7 +573,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
case IP_TTL:
if (optlen < 1)
goto e_inval;
- if (val != -1 && (val < 0 || val > 255))
+ if (val != -1 && (val < 1 || val > 255))
goto e_inval;
inet->uc_ttl = val;
break;
--
1.8.5.2

2014-02-05 21:04:11

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 054/213] netlink: wake up netlink listeners sooner (v2)

From: stephen hemminger <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 2c64580046a122fa15bb586d8ca4fd5e4b69a1e7 upstream.

This patch changes it to yield sooner at halfway instead. Still not a cure-all
for listener overrun if listner is slow, but works much reliably.

Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/netlink/af_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index fe6313140563..ba9f6129ff32 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -962,7 +962,7 @@ static inline int netlink_broadcast_deliver(struct sock *sk,
skb_set_owner_r(skb, sk);
skb_queue_tail(&sk->sk_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);
- return atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf;
+ return atomic_read(&sk->sk_rmem_alloc) > (sk->sk_rcvbuf >> 1);
}
return -1;
}
--
1.8.5.2

2014-02-05 21:04:34

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 053/213] net: fix a race in sock_queue_err_skb()

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 110c43304db6f06490961529536c362d9ac5732f upstream.

As soon as an skb is queued into socket error queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[PG: net/core/skbuff.c --> include/net/sock.h on 2.6.34 baseline]
Signed-off-by: Paul Gortmaker <[email protected]>
---
include/net/sock.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index b365fc2597c3..133e350c6fa3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1428,6 +1428,8 @@ extern int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);

static inline int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb)
{
+ int len = skb->len;
+
/* Cast skb->rcvbuf to unsigned... It's pointless, but reduces
number of warnings when compiling with -W --ANK
*/
@@ -1437,7 +1439,7 @@ static inline int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb)
skb_set_owner_r(skb, sk);
skb_queue_tail(&sk->sk_error_queue, skb);
if (!sock_flag(sk, SOCK_DEAD))
- sk->sk_data_ready(sk, skb->len);
+ sk->sk_data_ready(sk, len);
return 0;
}

--
1.8.5.2

2014-02-05 21:04:54

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 052/213] net_sched: gred: Fix oops in gred_dump() in WRED mode

From: David Ward <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 244b65dbfede788f2fa3fe2463c44d0809e97c6b upstream.

A parameter set exists for WRED mode, called wred_set, to hold the same
values for qavg and qidlestart across all VQs. The WRED mode values had
been previously held in the VQ for the default DP. After these values
were moved to wred_set, the VQ for the default DP was no longer created
automatically (so that it could be omitted on purpose, to have packets
in the default DP enqueued directly to the device without using RED).

However, gred_dump() was overlooked during that change; in WRED mode it
still reads qavg/qidlestart from the VQ for the default DP, which might
not even exist. As a result, this command sequence will cause an oops:

tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \
DPs 3 default 2 grio
tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS
tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONS

This fixes gred_dump() in WRED mode to use the values held in wred_set.

Signed-off-by: David Ward <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[PG: in 2.6.34 it is tab[]->parms vs. tab[]->vars of 3.4 baseline]
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/sched/sch_gred.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 51dcc2aa5c92..7d97904c7dae 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -545,11 +545,8 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb)
opt.packets = q->packetsin;
opt.bytesin = q->bytesin;

- if (gred_wred_mode(table)) {
- q->parms.qidlestart =
- table->tab[table->def]->parms.qidlestart;
- q->parms.qavg = table->tab[table->def]->parms.qavg;
- }
+ if (gred_wred_mode(table))
+ gred_load_wred_set(table, q);

opt.qave = red_calc_qavg(&q->parms, q->parms.qavg);

--
1.8.5.2

2014-02-05 21:05:16

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 051/213] netem: fix possible skb leak

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 116a0fc31c6c9b8fc821be5a96e5bf0b43260131 upstream.

skb_checksum_help(skb) can return an error, we must free skb in this
case. qdisc_drop(skb, sch) can also be feeded with a NULL skb (if
skb_unshare() failed), so lets use this generic helper.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/sched/sch_netem.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 4714ff162bbd..e105245dac93 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -202,10 +202,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) {
if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
(skb->ip_summed == CHECKSUM_PARTIAL &&
- skb_checksum_help(skb))) {
- sch->qstats.drops++;
- return NET_XMIT_DROP;
- }
+ skb_checksum_help(skb)))
+ return qdisc_drop(skb, sch);

skb->data[net_random() % skb_headlen(skb)] ^= 1<<(net_random() % 8);
}
--
1.8.5.2

2014-02-05 21:06:12

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 048/213] net_sched: gact: Fix potential panic in tcf_gact().

From: Hiroaki SHIMODA <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 696ecdc10622d86541f2e35cc16e15b6b3b1b67e upstream.

gact_rand array is accessed by gact->tcfg_ptype whose value
is assumed to less than MAX_RAND, but any range checks are
not performed.

So add a check in tcf_gact_init(). And in tcf_gact(), we can
reduce a branch.

Signed-off-by: Hiroaki SHIMODA <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/sched/act_gact.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index f9fc6ec1fef6..faebd8a6da57 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -67,6 +67,9 @@ static int tcf_gact_init(struct nlattr *nla, struct nlattr *est,
struct tcf_common *pc;
int ret = 0;
int err;
+#ifdef CONFIG_GACT_PROB
+ struct tc_gact_p *p_parm = NULL;
+#endif

if (nla == NULL)
return -EINVAL;
@@ -82,6 +85,12 @@ static int tcf_gact_init(struct nlattr *nla, struct nlattr *est,
#ifndef CONFIG_GACT_PROB
if (tb[TCA_GACT_PROB] != NULL)
return -EOPNOTSUPP;
+#else
+ if (tb[TCA_GACT_PROB]) {
+ p_parm = nla_data(tb[TCA_GACT_PROB]);
+ if (p_parm->ptype >= MAX_RAND)
+ return -EINVAL;
+ }
#endif

pc = tcf_hash_check(parm->index, a, bind, &gact_hash_info);
@@ -103,8 +112,7 @@ static int tcf_gact_init(struct nlattr *nla, struct nlattr *est,
spin_lock_bh(&gact->tcf_lock);
gact->tcf_action = parm->action;
#ifdef CONFIG_GACT_PROB
- if (tb[TCA_GACT_PROB] != NULL) {
- struct tc_gact_p *p_parm = nla_data(tb[TCA_GACT_PROB]);
+ if (p_parm) {
gact->tcfg_paction = p_parm->paction;
gact->tcfg_pval = p_parm->pval;
gact->tcfg_ptype = p_parm->ptype;
@@ -132,7 +140,7 @@ static int tcf_gact(struct sk_buff *skb, struct tc_action *a, struct tcf_result

spin_lock(&gact->tcf_lock);
#ifdef CONFIG_GACT_PROB
- if (gact->tcfg_ptype && gact_rand[gact->tcfg_ptype] != NULL)
+ if (gact->tcfg_ptype)
action = gact_rand[gact->tcfg_ptype](gact);
else
action = gact->tcf_action;
--
1.8.5.2

2014-02-05 21:06:09

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 040/213] tcp: tcp_sendpages() should call tcp_push() once

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2 upstream.

commit 2f533844242 (tcp: allow splice() to build full TSO packets) added
a regression for splice() calls using SPLICE_F_MORE.

We need to call tcp_flush() at the end of the last page processed in
tcp_sendpages(), or else transmits can be deferred and future sends
stall.

Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
with different semantic.

For all sendpage() providers, its a transparent change. Only
sock_sendpage() and tcp_sendpages() can differentiate the two different
flags provided by pipe_to_sendpage()

Reported-by: Tom Herbert <[email protected]>
Cc: Nandita Dukkipati <[email protected]>
Cc: Neal Cardwell <[email protected]>
Cc: Tom Herbert <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: H.K. Jerry Chu <[email protected]>
Cc: Maciej Żenczykowski <[email protected]>
Cc: Mahesh Bandewar <[email protected]>
Cc: Ilpo Järvinen <[email protected]>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/splice.c | 5 ++++-
include/linux/socket.h | 2 +-
net/ipv4/tcp.c | 2 +-
net/socket.c | 6 +++---
4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index cc617b09e4c2..3bec7c63be64 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -31,6 +31,7 @@
#include <linux/uio.h>
#include <linux/security.h>
#include <linux/gfp.h>
+#include <linux/socket.h>

/*
* Attempt to steal a page from a pipe buffer. This should perhaps go into
@@ -638,7 +639,9 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe,

ret = buf->ops->confirm(pipe, buf);
if (!ret) {
- more = (sd->flags & SPLICE_F_MORE) || sd->len < sd->total_len;
+ more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
+ if (sd->len < sd->total_len)
+ more |= MSG_SENDPAGE_NOTLAST;
if (file->f_op && file->f_op->sendpage)
ret = file->f_op->sendpage(file, buf->page, buf->offset,
sd->len, &pos, more);
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 354cc5617f8b..7cfb4f881644 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -256,7 +256,7 @@ struct ucred {
#define MSG_NOSIGNAL 0x4000 /* Do not generate SIGPIPE */
#define MSG_MORE 0x8000 /* Sender will send more */
#define MSG_WAITFORONE 0x10000 /* recvmmsg(): block until 1+ packets avail */
-
+#define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
#define MSG_EOF MSG_FIN

#define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exit for file
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index cea0a9223c5d..df671c76f196 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -849,7 +849,7 @@ wait_for_memory:
}

out:
- if (copied && !(flags & MSG_MORE))
+ if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
tcp_push(sk, flags, mss_now, tp->nonagle);
return copied;

diff --git a/net/socket.c b/net/socket.c
index c63ebf4e31b5..c802797e3a4a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -746,9 +746,9 @@ static ssize_t sock_sendpage(struct file *file, struct page *page,

sock = file->private_data;

- flags = !(file->f_flags & O_NONBLOCK) ? 0 : MSG_DONTWAIT;
- if (more)
- flags |= MSG_MORE;
+ flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
+ /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */
+ flags |= more;

return kernel_sendpage(sock, page, offset, size, flags);
}
--
1.8.5.2

2014-02-05 21:06:49

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 044/213] net: guard tcp_set_keepalive() to tcp sockets

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 3e10986d1d698140747fcfc2761ec9cb64c1d582 upstream.

Its possible to use RAW sockets to get a crash in
tcp_set_keepalive() / sk_reset_timer()

Fix is to make sure socket is a SOCK_STREAM one.

Reported-by: Dave Jones <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/core/sock.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 4b45ad8f6b9e..8ad61ba17a39 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -582,7 +582,8 @@ set_rcvbuf:

case SO_KEEPALIVE:
#ifdef CONFIG_INET
- if (sk->sk_protocol == IPPROTO_TCP)
+ if (sk->sk_protocol == IPPROTO_TCP &&
+ sk->sk_type == SOCK_STREAM)
tcp_set_keepalive(sk, valbool);
#endif
sock_valbool_flag(sk, SOCK_KEEPOPEN, valbool);
--
1.8.5.2

2014-02-05 21:07:18

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 042/213] tcp: preserve ACK clocking in TSO

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit f4541d60a449afd40448b06496dcd510f505928e upstream.

A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.

Current code leads to following bad bursty behavior :

20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119

While the expected behavior is more like :

20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Van Jacobson <[email protected]>
Cc: Neal Cardwell <[email protected]>
Cc: Nandita Dukkipati <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/ipv4/tcp_output.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6fdff3045b62..407a6ff44415 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1566,8 +1566,11 @@ static int tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb)
goto send_now;
}

- /* Ok, it looks like it is advisable to defer. */
- tp->tso_deferred = 1 | (jiffies << 1);
+ /* Ok, it looks like it is advisable to defer.
+ * Do not rearm the timer if already set to not break TCP ACK clocking.
+ */
+ if (!tp->tso_deferred)
+ tp->tso_deferred = 1 | (jiffies << 1);

return 1;

--
1.8.5.2

2014-02-05 21:07:16

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 043/213] net: fix info leak in compat dev_ifconf()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 43da5f2e0d0c69ded3d51907d9552310a6b545e8 upstream.

The implementation of dev_ifconf() for the compat ioctl interface uses
an intermediate ifc structure allocated in userland for the duration of
the syscall. Though, it fails to initialize the padding bytes inserted
for alignment and that for leaks four bytes of kernel stack. Add an
explicit memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/socket.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/socket.c b/net/socket.c
index c802797e3a4a..b0d3b6a025ea 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2508,6 +2508,7 @@ static int dev_ifconf(struct net *net, struct compat_ifconf __user *uifc32)
if (copy_from_user(&ifc32, uifc32, sizeof(struct compat_ifconf)))
return -EFAULT;

+ memset(&ifc, 0, sizeof(ifc));
if (ifc32.ifcbuf == 0) {
ifc32.ifc_len = 0;
ifc.ifc_len = 0;
--
1.8.5.2

2014-02-05 20:04:11

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 034/213] net: sctp: sctp_auth_key_put: use kzfree instead of kfree

From: Daniel Borkmann <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 586c31f3bf04c290dc0a0de7fc91d20aa9a5ee53 upstream.

For sensitive data like keying material, it is common practice to zero
out keys before returning the memory back to the allocator. Thus, use
kzfree instead of kfree.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Neil Horman <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/sctp/auth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/auth.c b/net/sctp/auth.c
index ddbbf7c81fa1..ce9ef56708ac 100644
--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -71,7 +71,7 @@ void sctp_auth_key_put(struct sctp_auth_bytes *key)
return;

if (atomic_dec_and_test(&key->refcnt)) {
- kfree(key);
+ kzfree(key);
SCTP_DBG_OBJCNT_DEC(keys);
}
}
--
1.8.5.2

2014-02-05 20:04:04

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 024/213] llc: fix info leak via getsockname()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 3592aaeb80290bda0f2cf0b5456c97bfc638b192 upstream.

The LLC code wrongly returns 0, i.e. "success", when the socket is
zapped. Together with the uninitialized uaddrlen pointer argument from
sys_getsockname this leads to an arbitrary memory leak of up to 128
bytes kernel stack via the getsockname() syscall.

Return an error instead when the socket is zapped to prevent the info
leak. Also remove the unnecessary memset(0). We don't directly write to
the memory pointed by uaddr but memcpy() a local structure at the end of
the function that is properly initialized.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/llc/af_llc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index ad4296c852eb..06010e1e89f9 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -959,14 +959,13 @@ static int llc_ui_getname(struct socket *sock, struct sockaddr *uaddr,
struct sockaddr_llc sllc;
struct sock *sk = sock->sk;
struct llc_sock *llc = llc_sk(sk);
- int rc = 0;
+ int rc = -EBADF;

memset(&sllc, 0, sizeof(sllc));
lock_sock(sk);
if (sock_flag(sk, SOCK_ZAPPED))
goto out;
*uaddrlen = sizeof(sllc);
- memset(uaddr, 0, *uaddrlen);
if (peer) {
rc = -ENOTCONN;
if (sk->sk_state != TCP_ESTABLISHED)
--
1.8.5.2

2014-02-05 21:09:38

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 038/213] unix: fix a race condition in unix_release()

From: Paul Moore <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit ded34e0fe8fe8c2d595bfa30626654e4b87621e0 upstream.

As reported by Jan, and others over the past few years, there is a
race condition caused by unix_release setting the sock->sk pointer
to NULL before properly marking the socket as dead/orphaned. This
can cause a problem with the LSM hook security_unix_may_send() if
there is another socket attempting to write to this partially
released socket in between when sock->sk is set to NULL and it is
marked as dead/orphaned. This patch fixes this by only setting
sock->sk to NULL after the socket has been marked as dead; I also
take the opportunity to make unix_release_sock() a void function
as it only ever returned 0/success.

Dave, I think this one should go on the -stable pile.

Special thanks to Jan for coming up with a reproducer for this
problem.

Reported-by: Jan Stancek <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/unix/af_unix.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 0b7148bed6b3..072835bc61fa 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -370,7 +370,7 @@ static void unix_sock_destructor(struct sock *sk)
#endif
}

-static int unix_release_sock(struct sock *sk, int embrion)
+static void unix_release_sock(struct sock *sk, int embrion)
{
struct unix_sock *u = unix_sk(sk);
struct dentry *dentry;
@@ -445,8 +445,6 @@ static int unix_release_sock(struct sock *sk, int embrion)

if (unix_tot_inflight)
unix_gc(); /* Garbage collect fds */
-
- return 0;
}

static int unix_listen(struct socket *sock, int backlog)
@@ -661,9 +659,10 @@ static int unix_release(struct socket *sock)
if (!sk)
return 0;

+ unix_release_sock(sk, 0);
sock->sk = NULL;

- return unix_release_sock(sk, 0);
+ return 0;
}

static int unix_autobind(struct socket *sock)
--
1.8.5.2

2014-02-05 21:09:59

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 005/213] inet: add RCU protection to inet->opt

From: Eric Dumazet <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit f6d8bd051c391c1c0458a30b2a7abcd939329259 upstream.

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[dannf/bwh: backported to Debian's 2.6.32]
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
[PG: use 2.6.32 patch, since it is closer to 2.6.34 than original
baseline; drop net/l2tp/l2tp_ip.c chunk as we don't have that file]
Signed-off-by: Paul Gortmaker <[email protected]>
---
include/net/inet_sock.h | 14 +++--
include/net/ip.h | 11 ++--
net/dccp/ipv4.c | 15 +++---
net/dccp/ipv6.c | 2 +-
net/ipv4/af_inet.c | 16 ++++--
net/ipv4/cipso_ipv4.c | 113 ++++++++++++++++++++++------------------
net/ipv4/icmp.c | 23 ++++----
net/ipv4/inet_connection_sock.c | 8 +--
net/ipv4/ip_options.c | 38 +++++++-------
net/ipv4/ip_output.c | 50 +++++++++---------
net/ipv4/ip_sockglue.c | 33 ++++++++----
net/ipv4/raw.c | 19 +++++--
net/ipv4/syncookies.c | 4 +-
net/ipv4/tcp_ipv4.c | 33 +++++++-----
net/ipv4/udp.c | 21 ++++++--
net/ipv6/tcp_ipv6.c | 2 +-
16 files changed, 235 insertions(+), 167 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 83fd34437cf1..648000dd4b97 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -56,7 +56,15 @@ struct ip_options {
unsigned char __data[0];
};

-#define optlength(opt) (sizeof(struct ip_options) + opt->optlen)
+struct ip_options_rcu {
+ struct rcu_head rcu;
+ struct ip_options opt;
+};
+
+struct ip_options_data {
+ struct ip_options_rcu opt;
+ char data[40];
+};

struct inet_request_sock {
struct request_sock req;
@@ -77,7 +85,7 @@ struct inet_request_sock {
acked : 1,
no_srccheck: 1;
kmemcheck_bitfield_end(flags);
- struct ip_options *opt;
+ struct ip_options_rcu *opt;
};

static inline struct inet_request_sock *inet_rsk(const struct request_sock *sk)
@@ -125,7 +133,7 @@ struct inet_sock {
__be16 inet_sport;
__u16 inet_id;

- struct ip_options *opt;
+ struct ip_options_rcu *inet_opt;
__u8 tos;
__u8 min_ttl;
__u8 mc_ttl;
diff --git a/include/net/ip.h b/include/net/ip.h
index 503994a38ed1..ac9506e74c29 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -52,7 +52,7 @@ static inline unsigned int ip_hdrlen(const struct sk_buff *skb)
struct ipcm_cookie {
__be32 addr;
int oif;
- struct ip_options *opt;
+ struct ip_options_rcu *opt;
union skb_shared_tx shtx;
};

@@ -89,7 +89,7 @@ extern int igmp_mc_proc_init(void);

extern int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
__be32 saddr, __be32 daddr,
- struct ip_options *opt);
+ struct ip_options_rcu *opt);
extern int ip_rcv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev);
extern int ip_local_deliver(struct sk_buff *skb);
@@ -376,14 +376,15 @@ extern int ip_forward(struct sk_buff *skb);
* Functions provided by ip_options.c
*/

-extern void ip_options_build(struct sk_buff *skb, struct ip_options *opt, __be32 daddr, struct rtable *rt, int is_frag);
+extern void ip_options_build(struct sk_buff *skb, struct ip_options *opt,
+ __be32 daddr, struct rtable *rt, int is_frag);
extern int ip_options_echo(struct ip_options *dopt, struct sk_buff *skb);
extern void ip_options_fragment(struct sk_buff *skb);
extern int ip_options_compile(struct net *net,
struct ip_options *opt, struct sk_buff *skb);
-extern int ip_options_get(struct net *net, struct ip_options **optp,
+extern int ip_options_get(struct net *net, struct ip_options_rcu **optp,
unsigned char *data, int optlen);
-extern int ip_options_get_from_user(struct net *net, struct ip_options **optp,
+extern int ip_options_get_from_user(struct net *net, struct ip_options_rcu **optp,
unsigned char __user *data, int optlen);
extern void ip_options_undo(struct ip_options * opt);
extern void ip_forward_options(struct sk_buff *skb);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index e072e018b068..d73f17ff95f6 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -48,6 +48,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
__be32 daddr, nexthop;
int tmp;
int err;
+ struct ip_options_rcu *inet_opt;

dp->dccps_role = DCCP_ROLE_CLIENT;

@@ -58,10 +59,12 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
return -EAFNOSUPPORT;

nexthop = daddr = usin->sin_addr.s_addr;
- if (inet->opt != NULL && inet->opt->srr) {
+
+ inet_opt = inet->inet_opt;
+ if (inet_opt != NULL && inet_opt->opt.srr) {
if (daddr == 0)
return -EINVAL;
- nexthop = inet->opt->faddr;
+ nexthop = inet_opt->opt.faddr;
}

tmp = ip_route_connect(&rt, nexthop, inet->inet_saddr,
@@ -76,7 +79,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
return -ENETUNREACH;
}

- if (inet->opt == NULL || !inet->opt->srr)
+ if (inet_opt == NULL || !inet_opt->opt.srr)
daddr = rt->rt_dst;

if (inet->inet_saddr == 0)
@@ -87,8 +90,8 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
inet->inet_daddr = daddr;

inet_csk(sk)->icsk_ext_hdr_len = 0;
- if (inet->opt != NULL)
- inet_csk(sk)->icsk_ext_hdr_len = inet->opt->optlen;
+ if (inet_opt)
+ inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
/*
* Socket identity is still unknown (sport may be zero).
* However we set state to DCCP_REQUESTING and not releasing socket
@@ -402,7 +405,7 @@ struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb,
newinet->inet_daddr = ireq->rmt_addr;
newinet->inet_rcv_saddr = ireq->loc_addr;
newinet->inet_saddr = ireq->loc_addr;
- newinet->opt = ireq->opt;
+ newinet->inet_opt = ireq->opt;
ireq->opt = NULL;
newinet->mc_index = inet_iif(skb);
newinet->mc_ttl = ip_hdr(skb)->ttl;
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index fec7de6cfe6e..90f65e1dd4cf 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -600,7 +600,7 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk,

First: no IPv4 options.
*/
- newinet->opt = NULL;
+ newinet->inet_opt = NULL;

/* Clone RX bits */
newnp->rxopt.all = np->rxopt.all;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8897b3c7d05a..4dd4aad58d93 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -153,7 +153,7 @@ void inet_sock_destruct(struct sock *sk)
WARN_ON(sk->sk_wmem_queued);
WARN_ON(sk->sk_forward_alloc);

- kfree(inet->opt);
+ kfree(inet->inet_opt);
dst_release(sk->sk_dst_cache);
sk_refcnt_debug_dec(sk);
}
@@ -1069,9 +1069,11 @@ static int inet_sk_reselect_saddr(struct sock *sk)
__be32 old_saddr = inet->inet_saddr;
__be32 new_saddr;
__be32 daddr = inet->inet_daddr;
+ struct ip_options_rcu *inet_opt;

- if (inet->opt && inet->opt->srr)
- daddr = inet->opt->faddr;
+ inet_opt = inet->inet_opt;
+ if (inet_opt && inet_opt->opt.srr)
+ daddr = inet_opt->opt.faddr;

/* Query new route. */
err = ip_route_connect(&rt, daddr, 0,
@@ -1113,6 +1115,7 @@ int inet_sk_rebuild_header(struct sock *sk)
struct inet_sock *inet = inet_sk(sk);
struct rtable *rt = (struct rtable *)__sk_dst_check(sk, 0);
__be32 daddr;
+ struct ip_options_rcu *inet_opt;
int err;

/* Route is OK, nothing to do. */
@@ -1120,9 +1123,12 @@ int inet_sk_rebuild_header(struct sock *sk)
return 0;

/* Reroute. */
+ rcu_read_lock();
+ inet_opt = rcu_dereference(inet->inet_opt);
daddr = inet->inet_daddr;
- if (inet->opt && inet->opt->srr)
- daddr = inet->opt->faddr;
+ if (inet_opt && inet_opt->opt.srr)
+ daddr = inet_opt->opt.faddr;
+ rcu_read_unlock();
{
struct flowi fl = {
.oif = sk->sk_bound_dev_if,
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index c97cd9ff697e..d5ef60963183 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1859,6 +1859,11 @@ static int cipso_v4_genopt(unsigned char *buf, u32 buf_len,
return CIPSO_V4_HDR_LEN + ret_val;
}

+static void opt_kfree_rcu(struct rcu_head *head)
+{
+ kfree(container_of(head, struct ip_options_rcu, rcu));
+}
+
/**
* cipso_v4_sock_setattr - Add a CIPSO option to a socket
* @sk: the socket
@@ -1881,7 +1886,7 @@ int cipso_v4_sock_setattr(struct sock *sk,
unsigned char *buf = NULL;
u32 buf_len;
u32 opt_len;
- struct ip_options *opt = NULL;
+ struct ip_options_rcu *old, *opt = NULL;
struct inet_sock *sk_inet;
struct inet_connection_sock *sk_conn;

@@ -1917,22 +1922,25 @@ int cipso_v4_sock_setattr(struct sock *sk,
ret_val = -ENOMEM;
goto socket_setattr_failure;
}
- memcpy(opt->__data, buf, buf_len);
- opt->optlen = opt_len;
- opt->cipso = sizeof(struct iphdr);
+ memcpy(opt->opt.__data, buf, buf_len);
+ opt->opt.optlen = opt_len;
+ opt->opt.cipso = sizeof(struct iphdr);
kfree(buf);
buf = NULL;

sk_inet = inet_sk(sk);
+
+ old = sk_inet->inet_opt;
if (sk_inet->is_icsk) {
sk_conn = inet_csk(sk);
- if (sk_inet->opt)
- sk_conn->icsk_ext_hdr_len -= sk_inet->opt->optlen;
- sk_conn->icsk_ext_hdr_len += opt->optlen;
+ if (old)
+ sk_conn->icsk_ext_hdr_len -= old->opt.optlen;
+ sk_conn->icsk_ext_hdr_len += opt->opt.optlen;
sk_conn->icsk_sync_mss(sk, sk_conn->icsk_pmtu_cookie);
}
- opt = xchg(&sk_inet->opt, opt);
- kfree(opt);
+ rcu_assign_pointer(sk_inet->inet_opt, opt);
+ if (old)
+ call_rcu(&old->rcu, opt_kfree_rcu);

return 0;

@@ -1962,7 +1970,7 @@ int cipso_v4_req_setattr(struct request_sock *req,
unsigned char *buf = NULL;
u32 buf_len;
u32 opt_len;
- struct ip_options *opt = NULL;
+ struct ip_options_rcu *opt = NULL;
struct inet_request_sock *req_inet;

/* We allocate the maximum CIPSO option size here so we are probably
@@ -1990,15 +1998,16 @@ int cipso_v4_req_setattr(struct request_sock *req,
ret_val = -ENOMEM;
goto req_setattr_failure;
}
- memcpy(opt->__data, buf, buf_len);
- opt->optlen = opt_len;
- opt->cipso = sizeof(struct iphdr);
+ memcpy(opt->opt.__data, buf, buf_len);
+ opt->opt.optlen = opt_len;
+ opt->opt.cipso = sizeof(struct iphdr);
kfree(buf);
buf = NULL;

req_inet = inet_rsk(req);
opt = xchg(&req_inet->opt, opt);
- kfree(opt);
+ if (opt)
+ call_rcu(&opt->rcu, opt_kfree_rcu);

return 0;

@@ -2018,34 +2027,34 @@ req_setattr_failure:
* values on failure.
*
*/
-static int cipso_v4_delopt(struct ip_options **opt_ptr)
+static int cipso_v4_delopt(struct ip_options_rcu **opt_ptr)
{
int hdr_delta = 0;
- struct ip_options *opt = *opt_ptr;
+ struct ip_options_rcu *opt = *opt_ptr;

- if (opt->srr || opt->rr || opt->ts || opt->router_alert) {
+ if (opt->opt.srr || opt->opt.rr || opt->opt.ts || opt->opt.router_alert) {
u8 cipso_len;
u8 cipso_off;
unsigned char *cipso_ptr;
int iter;
int optlen_new;

- cipso_off = opt->cipso - sizeof(struct iphdr);
- cipso_ptr = &opt->__data[cipso_off];
+ cipso_off = opt->opt.cipso - sizeof(struct iphdr);
+ cipso_ptr = &opt->opt.__data[cipso_off];
cipso_len = cipso_ptr[1];

- if (opt->srr > opt->cipso)
- opt->srr -= cipso_len;
- if (opt->rr > opt->cipso)
- opt->rr -= cipso_len;
- if (opt->ts > opt->cipso)
- opt->ts -= cipso_len;
- if (opt->router_alert > opt->cipso)
- opt->router_alert -= cipso_len;
- opt->cipso = 0;
+ if (opt->opt.srr > opt->opt.cipso)
+ opt->opt.srr -= cipso_len;
+ if (opt->opt.rr > opt->opt.cipso)
+ opt->opt.rr -= cipso_len;
+ if (opt->opt.ts > opt->opt.cipso)
+ opt->opt.ts -= cipso_len;
+ if (opt->opt.router_alert > opt->opt.cipso)
+ opt->opt.router_alert -= cipso_len;
+ opt->opt.cipso = 0;

memmove(cipso_ptr, cipso_ptr + cipso_len,
- opt->optlen - cipso_off - cipso_len);
+ opt->opt.optlen - cipso_off - cipso_len);

/* determining the new total option length is tricky because of
* the padding necessary, the only thing i can think to do at
@@ -2054,21 +2063,21 @@ static int cipso_v4_delopt(struct ip_options **opt_ptr)
* from there we can determine the new total option length */
iter = 0;
optlen_new = 0;
- while (iter < opt->optlen)
- if (opt->__data[iter] != IPOPT_NOP) {
- iter += opt->__data[iter + 1];
+ while (iter < opt->opt.optlen)
+ if (opt->opt.__data[iter] != IPOPT_NOP) {
+ iter += opt->opt.__data[iter + 1];
optlen_new = iter;
} else
iter++;
- hdr_delta = opt->optlen;
- opt->optlen = (optlen_new + 3) & ~3;
- hdr_delta -= opt->optlen;
+ hdr_delta = opt->opt.optlen;
+ opt->opt.optlen = (optlen_new + 3) & ~3;
+ hdr_delta -= opt->opt.optlen;
} else {
/* only the cipso option was present on the socket so we can
* remove the entire option struct */
*opt_ptr = NULL;
- hdr_delta = opt->optlen;
- kfree(opt);
+ hdr_delta = opt->opt.optlen;
+ call_rcu(&opt->rcu, opt_kfree_rcu);
}

return hdr_delta;
@@ -2085,15 +2094,15 @@ static int cipso_v4_delopt(struct ip_options **opt_ptr)
void cipso_v4_sock_delattr(struct sock *sk)
{
int hdr_delta;
- struct ip_options *opt;
+ struct ip_options_rcu *opt;
struct inet_sock *sk_inet;

sk_inet = inet_sk(sk);
- opt = sk_inet->opt;
- if (opt == NULL || opt->cipso == 0)
+ opt = sk_inet->inet_opt;
+ if (opt == NULL || opt->opt.cipso == 0)
return;

- hdr_delta = cipso_v4_delopt(&sk_inet->opt);
+ hdr_delta = cipso_v4_delopt(&sk_inet->inet_opt);
if (sk_inet->is_icsk && hdr_delta > 0) {
struct inet_connection_sock *sk_conn = inet_csk(sk);
sk_conn->icsk_ext_hdr_len -= hdr_delta;
@@ -2111,12 +2120,12 @@ void cipso_v4_sock_delattr(struct sock *sk)
*/
void cipso_v4_req_delattr(struct request_sock *req)
{
- struct ip_options *opt;
+ struct ip_options_rcu *opt;
struct inet_request_sock *req_inet;

req_inet = inet_rsk(req);
opt = req_inet->opt;
- if (opt == NULL || opt->cipso == 0)
+ if (opt == NULL || opt->opt.cipso == 0)
return;

cipso_v4_delopt(&req_inet->opt);
@@ -2186,14 +2195,18 @@ getattr_return:
*/
int cipso_v4_sock_getattr(struct sock *sk, struct netlbl_lsm_secattr *secattr)
{
- struct ip_options *opt;
+ struct ip_options_rcu *opt;
+ int res = -ENOMSG;

- opt = inet_sk(sk)->opt;
- if (opt == NULL || opt->cipso == 0)
- return -ENOMSG;
-
- return cipso_v4_getattr(opt->__data + opt->cipso - sizeof(struct iphdr),
- secattr);
+ rcu_read_lock();
+ opt = rcu_dereference(inet_sk(sk)->inet_opt);
+ if (opt && opt->opt.cipso)
+ res = cipso_v4_getattr(opt->opt.__data +
+ opt->opt.cipso -
+ sizeof(struct iphdr),
+ secattr);
+ rcu_read_unlock();
+ return res;
}

/**
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index ac4dec132735..4a5137a9e24c 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -108,8 +108,7 @@ struct icmp_bxm {
__be32 times[3];
} data;
int head_len;
- struct ip_options replyopts;
- unsigned char optbuf[40];
+ struct ip_options_data replyopts;
};

/* An array of errno for error messages from dest unreach. */
@@ -363,7 +362,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
struct inet_sock *inet;
__be32 daddr;

- if (ip_options_echo(&icmp_param->replyopts, skb))
+ if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb))
return;

sk = icmp_xmit_lock(net);
@@ -377,10 +376,10 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
daddr = ipc.addr = rt->rt_src;
ipc.opt = NULL;
ipc.shtx.flags = 0;
- if (icmp_param->replyopts.optlen) {
- ipc.opt = &icmp_param->replyopts;
- if (ipc.opt->srr)
- daddr = icmp_param->replyopts.faddr;
+ if (icmp_param->replyopts.opt.opt.optlen) {
+ ipc.opt = &icmp_param->replyopts.opt;
+ if (ipc.opt->opt.srr)
+ daddr = icmp_param->replyopts.opt.opt.faddr;
}
{
struct flowi fl = { .nl_u = { .ip4_u =
@@ -518,7 +517,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
IPTOS_PREC_INTERNETCONTROL) :
iph->tos;

- if (ip_options_echo(&icmp_param.replyopts, skb_in))
+ if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
goto out_unlock;


@@ -534,15 +533,15 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
icmp_param.offset = skb_network_offset(skb_in);
inet_sk(sk)->tos = tos;
ipc.addr = iph->saddr;
- ipc.opt = &icmp_param.replyopts;
+ ipc.opt = &icmp_param.replyopts.opt;
ipc.shtx.flags = 0;

{
struct flowi fl = {
.nl_u = {
.ip4_u = {
- .daddr = icmp_param.replyopts.srr ?
- icmp_param.replyopts.faddr :
+ .daddr = icmp_param.replyopts.opt.opt.srr ?
+ icmp_param.replyopts.opt.opt.faddr :
iph->saddr,
.saddr = saddr,
.tos = RT_TOS(tos)
@@ -631,7 +630,7 @@ route_done:
room = dst_mtu(&rt->u.dst);
if (room > 576)
room = 576;
- room -= sizeof(struct iphdr) + icmp_param.replyopts.optlen;
+ room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
room -= sizeof(struct icmphdr);

icmp_param.data_len = skb_in->len - icmp_param.offset;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 8da6429269dd..9f57d0f75631 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -356,12 +356,12 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
{
struct rtable *rt;
const struct inet_request_sock *ireq = inet_rsk(req);
- struct ip_options *opt = inet_rsk(req)->opt;
+ struct ip_options_rcu *opt = inet_rsk(req)->opt;
struct flowi fl = { .oif = sk->sk_bound_dev_if,
.mark = sk->sk_mark,
.nl_u = { .ip4_u =
- { .daddr = ((opt && opt->srr) ?
- opt->faddr :
+ { .daddr = ((opt && opt->opt.srr) ?
+ opt->opt.faddr :
ireq->rmt_addr),
.saddr = ireq->loc_addr,
.tos = RT_CONN_FLAGS(sk) } },
@@ -375,7 +375,7 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
security_req_classify_flow(req, &fl);
if (ip_route_output_flow(net, &rt, &fl, sk, 0))
goto no_route;
- if (opt && opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
+ if (opt && opt->opt.is_strictroute && rt->rt_dst != rt->rt_gateway)
goto route_err;
return &rt->u.dst;

diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 4c09a31fd140..f4281aad1df0 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -36,7 +36,7 @@
* saddr is address of outgoing interface.
*/

-void ip_options_build(struct sk_buff * skb, struct ip_options * opt,
+void ip_options_build(struct sk_buff *skb, struct ip_options *opt,
__be32 daddr, struct rtable *rt, int is_frag)
{
unsigned char *iph = skb_network_header(skb);
@@ -83,9 +83,9 @@ void ip_options_build(struct sk_buff * skb, struct ip_options * opt,
* NOTE: dopt cannot point to skb.
*/

-int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)
+int ip_options_echo(struct ip_options *dopt, struct sk_buff *skb)
{
- struct ip_options *sopt;
+ const struct ip_options *sopt;
unsigned char *sptr, *dptr;
int soffset, doffset;
int optlen;
@@ -95,10 +95,8 @@ int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)

sopt = &(IPCB(skb)->opt);

- if (sopt->optlen == 0) {
- dopt->optlen = 0;
+ if (sopt->optlen == 0)
return 0;
- }

sptr = skb_network_header(skb);
dptr = dopt->__data;
@@ -157,7 +155,7 @@ int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)
dopt->optlen += optlen;
}
if (sopt->srr) {
- unsigned char * start = sptr+sopt->srr;
+ unsigned char *start = sptr+sopt->srr;
__be32 faddr;

optlen = start[1];
@@ -500,19 +498,19 @@ void ip_options_undo(struct ip_options * opt)
}
}

-static struct ip_options *ip_options_get_alloc(const int optlen)
+static struct ip_options_rcu *ip_options_get_alloc(const int optlen)
{
- return kzalloc(sizeof(struct ip_options) + ((optlen + 3) & ~3),
+ return kzalloc(sizeof(struct ip_options_rcu) + ((optlen + 3) & ~3),
GFP_KERNEL);
}

-static int ip_options_get_finish(struct net *net, struct ip_options **optp,
- struct ip_options *opt, int optlen)
+static int ip_options_get_finish(struct net *net, struct ip_options_rcu **optp,
+ struct ip_options_rcu *opt, int optlen)
{
while (optlen & 3)
- opt->__data[optlen++] = IPOPT_END;
- opt->optlen = optlen;
- if (optlen && ip_options_compile(net, opt, NULL)) {
+ opt->opt.__data[optlen++] = IPOPT_END;
+ opt->opt.optlen = optlen;
+ if (optlen && ip_options_compile(net, &opt->opt, NULL)) {
kfree(opt);
return -EINVAL;
}
@@ -521,29 +519,29 @@ static int ip_options_get_finish(struct net *net, struct ip_options **optp,
return 0;
}

-int ip_options_get_from_user(struct net *net, struct ip_options **optp,
+int ip_options_get_from_user(struct net *net, struct ip_options_rcu **optp,
unsigned char __user *data, int optlen)
{
- struct ip_options *opt = ip_options_get_alloc(optlen);
+ struct ip_options_rcu *opt = ip_options_get_alloc(optlen);

if (!opt)
return -ENOMEM;
- if (optlen && copy_from_user(opt->__data, data, optlen)) {
+ if (optlen && copy_from_user(opt->opt.__data, data, optlen)) {
kfree(opt);
return -EFAULT;
}
return ip_options_get_finish(net, optp, opt, optlen);
}

-int ip_options_get(struct net *net, struct ip_options **optp,
+int ip_options_get(struct net *net, struct ip_options_rcu **optp,
unsigned char *data, int optlen)
{
- struct ip_options *opt = ip_options_get_alloc(optlen);
+ struct ip_options_rcu *opt = ip_options_get_alloc(optlen);

if (!opt)
return -ENOMEM;
if (optlen)
- memcpy(opt->__data, data, optlen);
+ memcpy(opt->opt.__data, data, optlen);
return ip_options_get_finish(net, optp, opt, optlen);
}

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d52fe4bd573f..e669da63be31 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -138,14 +138,14 @@ static inline int ip_select_ttl(struct inet_sock *inet, struct dst_entry *dst)
*
*/
int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
- __be32 saddr, __be32 daddr, struct ip_options *opt)
+ __be32 saddr, __be32 daddr, struct ip_options_rcu *opt)
{
struct inet_sock *inet = inet_sk(sk);
struct rtable *rt = skb_rtable(skb);
struct iphdr *iph;

/* Build the IP header. */
- skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0));
+ skb_push(skb, sizeof(struct iphdr) + (opt ? opt->opt.optlen : 0));
skb_reset_network_header(skb);
iph = ip_hdr(skb);
iph->version = 4;
@@ -161,9 +161,9 @@ int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
iph->protocol = sk->sk_protocol;
ip_select_ident(iph, &rt->u.dst, sk);

- if (opt && opt->optlen) {
- iph->ihl += opt->optlen>>2;
- ip_options_build(skb, opt, daddr, rt, 0);
+ if (opt && opt->opt.optlen) {
+ iph->ihl += opt->opt.optlen>>2;
+ ip_options_build(skb, &opt->opt, daddr, rt, 0);
}

skb->priority = sk->sk_priority;
@@ -315,9 +315,10 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
{
struct sock *sk = skb->sk;
struct inet_sock *inet = inet_sk(sk);
- struct ip_options *opt = inet->opt;
+ struct ip_options_rcu *inet_opt = NULL;
struct rtable *rt;
struct iphdr *iph;
+ int res;

/* Skip all of this if the packet is already routed,
* f.e. by something like SCTP.
@@ -328,13 +329,15 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)

/* Make sure we can route this packet. */
rt = (struct rtable *)__sk_dst_check(sk, 0);
+ rcu_read_lock();
+ inet_opt = rcu_dereference(inet->inet_opt);
if (rt == NULL) {
__be32 daddr;

/* Use correct destination address if we have options. */
daddr = inet->inet_daddr;
- if(opt && opt->srr)
- daddr = opt->faddr;
+ if (inet_opt && inet_opt->opt.srr)
+ daddr = inet_opt->opt.faddr;

{
struct flowi fl = { .oif = sk->sk_bound_dev_if,
@@ -362,11 +365,11 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
skb_dst_set(skb, dst_clone(&rt->u.dst));

packet_routed:
- if (opt && opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
+ if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_dst != rt->rt_gateway)
goto no_route;

/* OK, we know where to send it, allocate and build IP header. */
- skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0));
+ skb_push(skb, sizeof(struct iphdr) + (inet_opt ? inet_opt->opt.optlen : 0));
skb_reset_network_header(skb);
iph = ip_hdr(skb);
*((__be16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff));
@@ -380,9 +383,9 @@ packet_routed:
iph->daddr = rt->rt_dst;
/* Transport layer set skb->h.foo itself. */

- if (opt && opt->optlen) {
- iph->ihl += opt->optlen >> 2;
- ip_options_build(skb, opt, inet->inet_daddr, rt, 0);
+ if (inet_opt && inet_opt->opt.optlen) {
+ iph->ihl += inet_opt->opt.optlen >> 2;
+ ip_options_build(skb, &inet_opt->opt, inet->inet_daddr, rt, 0);
}

ip_select_ident_more(iph, &rt->u.dst, sk,
@@ -390,10 +393,12 @@ packet_routed:

skb->priority = sk->sk_priority;
skb->mark = sk->sk_mark;
-
- return ip_local_out(skb);
+ res = ip_local_out(skb);
+ rcu_read_unlock();
+ return res;

no_route:
+ rcu_read_unlock();
IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
kfree_skb(skb);
return -EHOSTUNREACH;
@@ -812,7 +817,7 @@ int ip_append_data(struct sock *sk,
/*
* setup for corking.
*/
- opt = ipc->opt;
+ opt = ipc->opt ? &ipc->opt->opt : NULL;
if (opt) {
if (inet->cork.opt == NULL) {
inet->cork.opt = kmalloc(sizeof(struct ip_options) + 40, sk->sk_allocation);
@@ -1371,26 +1376,23 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
unsigned int len)
{
struct inet_sock *inet = inet_sk(sk);
- struct {
- struct ip_options opt;
- char data[40];
- } replyopts;
+ struct ip_options_data replyopts;
struct ipcm_cookie ipc;
__be32 daddr;
struct rtable *rt = skb_rtable(skb);

- if (ip_options_echo(&replyopts.opt, skb))
+ if (ip_options_echo(&replyopts.opt.opt, skb))
return;

daddr = ipc.addr = rt->rt_src;
ipc.opt = NULL;
ipc.shtx.flags = 0;

- if (replyopts.opt.optlen) {
+ if (replyopts.opt.opt.optlen) {
ipc.opt = &replyopts.opt;

- if (ipc.opt->srr)
- daddr = replyopts.opt.faddr;
+ if (replyopts.opt.opt.srr)
+ daddr = replyopts.opt.opt.faddr;
}

{
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 1e64dabbd232..e4256fe59a30 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -435,6 +435,11 @@ out:
}


+static void opt_kfree_rcu(struct rcu_head *head)
+{
+ kfree(container_of(head, struct ip_options_rcu, rcu));
+}
+
/*
* Socket option code for IP. This is the end of the line after any
* TCP,UDP etc options on an IP socket.
@@ -481,13 +486,15 @@ static int do_ip_setsockopt(struct sock *sk, int level,
switch (optname) {
case IP_OPTIONS:
{
- struct ip_options *opt = NULL;
+ struct ip_options_rcu *old, *opt = NULL;
+
if (optlen > 40)
goto e_inval;
err = ip_options_get_from_user(sock_net(sk), &opt,
optval, optlen);
if (err)
break;
+ old = inet->inet_opt;
if (inet->is_icsk) {
struct inet_connection_sock *icsk = inet_csk(sk);
#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
@@ -496,17 +503,18 @@ static int do_ip_setsockopt(struct sock *sk, int level,
(TCPF_LISTEN | TCPF_CLOSE)) &&
inet->inet_daddr != LOOPBACK4_IPV6)) {
#endif
- if (inet->opt)
- icsk->icsk_ext_hdr_len -= inet->opt->optlen;
+ if (old)
+ icsk->icsk_ext_hdr_len -= old->opt.optlen;
if (opt)
- icsk->icsk_ext_hdr_len += opt->optlen;
+ icsk->icsk_ext_hdr_len += opt->opt.optlen;
icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
}
#endif
}
- opt = xchg(&inet->opt, opt);
- kfree(opt);
+ rcu_assign_pointer(inet->inet_opt, opt);
+ if (old)
+ call_rcu(&old->rcu, opt_kfree_rcu);
break;
}
case IP_PKTINFO:
@@ -1042,12 +1050,15 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_OPTIONS:
{
unsigned char optbuf[sizeof(struct ip_options)+40];
- struct ip_options * opt = (struct ip_options *)optbuf;
+ struct ip_options *opt = (struct ip_options *)optbuf;
+ struct ip_options_rcu *inet_opt;
+
+ inet_opt = inet->inet_opt;
opt->optlen = 0;
- if (inet->opt)
- memcpy(optbuf, inet->opt,
- sizeof(struct ip_options)+
- inet->opt->optlen);
+ if (inet_opt)
+ memcpy(optbuf, &inet_opt->opt,
+ sizeof(struct ip_options) +
+ inet_opt->opt.optlen);
release_sock(sk);

if (opt->optlen == 0)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index cc6f097fbd5f..d5f57acd6f4b 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -457,6 +457,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
__be32 saddr;
u8 tos;
int err;
+ struct ip_options_data opt_copy;

err = -EMSGSIZE;
if (len > 0xFFFF)
@@ -517,8 +518,18 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
saddr = ipc.addr;
ipc.addr = daddr;

- if (!ipc.opt)
- ipc.opt = inet->opt;
+ if (!ipc.opt) {
+ struct ip_options_rcu *inet_opt;
+
+ rcu_read_lock();
+ inet_opt = rcu_dereference(inet->inet_opt);
+ if (inet_opt) {
+ memcpy(&opt_copy, inet_opt,
+ sizeof(*inet_opt) + inet_opt->opt.optlen);
+ ipc.opt = &opt_copy.opt;
+ }
+ rcu_read_unlock();
+ }

if (ipc.opt) {
err = -EINVAL;
@@ -527,10 +538,10 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
*/
if (inet->hdrincl)
goto done;
- if (ipc.opt->srr) {
+ if (ipc.opt->opt.srr) {
if (!daddr)
goto done;
- daddr = ipc.opt->faddr;
+ daddr = ipc.opt->opt.faddr;
}
}
tos = RT_CONN_FLAGS(sk);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 9f6b22206c52..95ac6d7e0f42 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -310,10 +310,10 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
* the ACK carries the same options again (see RFC1122 4.2.3.8)
*/
if (opt && opt->optlen) {
- int opt_size = sizeof(struct ip_options) + opt->optlen;
+ int opt_size = sizeof(struct ip_options_rcu) + opt->optlen;

ireq->opt = kmalloc(opt_size, GFP_ATOMIC);
- if (ireq->opt != NULL && ip_options_echo(ireq->opt, skb)) {
+ if (ireq->opt != NULL && ip_options_echo(&ireq->opt->opt, skb)) {
kfree(ireq->opt);
ireq->opt = NULL;
}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ab7165565d23..8a0bff623731 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -153,6 +153,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
__be32 daddr, nexthop;
int tmp;
int err;
+ struct ip_options_rcu *inet_opt;

if (addr_len < sizeof(struct sockaddr_in))
return -EINVAL;
@@ -161,10 +162,11 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
return -EAFNOSUPPORT;

nexthop = daddr = usin->sin_addr.s_addr;
- if (inet->opt && inet->opt->srr) {
+ inet_opt = inet->inet_opt;
+ if (inet_opt && inet_opt->opt.srr) {
if (!daddr)
return -EINVAL;
- nexthop = inet->opt->faddr;
+ nexthop = inet_opt->opt.faddr;
}

tmp = ip_route_connect(&rt, nexthop, inet->inet_saddr,
@@ -182,7 +184,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
return -ENETUNREACH;
}

- if (!inet->opt || !inet->opt->srr)
+ if (!inet_opt || !inet_opt->opt.srr)
daddr = rt->rt_dst;

if (!inet->inet_saddr)
@@ -216,8 +218,8 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
inet->inet_daddr = daddr;

inet_csk(sk)->icsk_ext_hdr_len = 0;
- if (inet->opt)
- inet_csk(sk)->icsk_ext_hdr_len = inet->opt->optlen;
+ if (inet_opt)
+ inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;

tp->rx_opt.mss_clamp = TCP_MSS_DEFAULT;

@@ -812,17 +814,18 @@ static void syn_flood_warning(struct sk_buff *skb)
/*
* Save and compile IPv4 options into the request_sock if needed.
*/
-static struct ip_options *tcp_v4_save_options(struct sock *sk,
- struct sk_buff *skb)
+static struct ip_options_rcu *tcp_v4_save_options(struct sock *sk,
+ struct sk_buff *skb)
{
- struct ip_options *opt = &(IPCB(skb)->opt);
- struct ip_options *dopt = NULL;
+ const struct ip_options *opt = &(IPCB(skb)->opt);
+ struct ip_options_rcu *dopt = NULL;

if (opt && opt->optlen) {
- int opt_size = optlength(opt);
+ int opt_size = sizeof(*dopt) + opt->optlen;
+
dopt = kmalloc(opt_size, GFP_ATOMIC);
if (dopt) {
- if (ip_options_echo(dopt, skb)) {
+ if (ip_options_echo(&dopt->opt, skb)) {
kfree(dopt);
dopt = NULL;
}
@@ -1412,6 +1415,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
#ifdef CONFIG_TCP_MD5SIG
struct tcp_md5sig_key *key;
#endif
+ struct ip_options_rcu *inet_opt;

if (sk_acceptq_is_full(sk))
goto exit_overflow;
@@ -1432,13 +1436,14 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
newinet->inet_daddr = ireq->rmt_addr;
newinet->inet_rcv_saddr = ireq->loc_addr;
newinet->inet_saddr = ireq->loc_addr;
- newinet->opt = ireq->opt;
+ inet_opt = ireq->opt;
+ rcu_assign_pointer(newinet->inet_opt, inet_opt);
ireq->opt = NULL;
newinet->mc_index = inet_iif(skb);
newinet->mc_ttl = ip_hdr(skb)->ttl;
inet_csk(newsk)->icsk_ext_hdr_len = 0;
- if (newinet->opt)
- inet_csk(newsk)->icsk_ext_hdr_len = newinet->opt->optlen;
+ if (inet_opt)
+ inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
newinet->inet_id = newtp->write_seq ^ jiffies;

tcp_mtup_init(newsk);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 7932dc68c669..7f0a1ae0544b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -784,6 +784,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
int err, is_udplite = IS_UDPLITE(sk);
int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
+ struct ip_options_data opt_copy;

if (len > 0xFFFF)
return -EMSGSIZE;
@@ -855,22 +856,32 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
free = 1;
connected = 0;
}
- if (!ipc.opt)
- ipc.opt = inet->opt;
+ if (!ipc.opt) {
+ struct ip_options_rcu *inet_opt;
+
+ rcu_read_lock();
+ inet_opt = rcu_dereference(inet->inet_opt);
+ if (inet_opt) {
+ memcpy(&opt_copy, inet_opt,
+ sizeof(*inet_opt) + inet_opt->opt.optlen);
+ ipc.opt = &opt_copy.opt;
+ }
+ rcu_read_unlock();
+ }

saddr = ipc.addr;
ipc.addr = faddr = daddr;

- if (ipc.opt && ipc.opt->srr) {
+ if (ipc.opt && ipc.opt->opt.srr) {
if (!daddr)
return -EINVAL;
- faddr = ipc.opt->faddr;
+ faddr = ipc.opt->opt.faddr;
connected = 0;
}
tos = RT_TOS(inet->tos);
if (sock_flag(sk, SOCK_LOCALROUTE) ||
(msg->msg_flags & MSG_DONTROUTE) ||
- (ipc.opt && ipc.opt->is_strictroute)) {
+ (ipc.opt && ipc.opt->opt.is_strictroute)) {
tos |= RTO_ONLINK;
connected = 0;
}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d854453b4daa..138a2db58bf8 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1446,7 +1446,7 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,

First: no IPv4 options.
*/
- newinet->opt = NULL;
+ newinet->inet_opt = NULL;
newnp->ipv6_fl_list = NULL;

/* Clone RX bits */
--
1.8.5.2

2014-02-05 21:10:36

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 037/213] sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails

From: Tommi Rantala <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit be364c8c0f17a3dd42707b5a090b318028538eb9 upstream.

Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:

#include <string.h>
#include <arpa/inet.h>
#include <sys/socket.h>

int main(void)
{
int fd;
struct sockaddr_in sa;

fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
if (fd < 0)
return 1;

memset(&sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(11111);

sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));

return 0;
}

As far as I can tell, the leak has been around since ~2003.

Signed-off-by: Tommi Rantala <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/sctp/chunk.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 3eab6db59a37..df773fcaaa03 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -282,7 +282,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
goto errout;
err = sctp_user_addto_chunk(chunk, offset, len, msgh->msg_iov);
if (err < 0)
- goto errout;
+ goto errout_chunk_free;

offset += len;

@@ -322,7 +322,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
__skb_pull(chunk->skb, (__u8 *)chunk->chunk_hdr
- (__u8 *)chunk->skb->data);
if (err < 0)
- goto errout;
+ goto errout_chunk_free;

sctp_datamsg_assign(msg, chunk);
list_add_tail(&chunk->frag_list, &msg->chunks);
@@ -330,6 +330,9 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,

return msg;

+errout_chunk_free:
+ sctp_chunk_free(chunk);
+
errout:
list_for_each_safe(pos, temp, &msg->chunks) {
list_del_init(pos);
--
1.8.5.2

2014-02-05 21:10:48

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 036/213] net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree

From: Daniel Borkmann <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 6ba542a291a5e558603ac51cda9bded347ce7627 upstream.

In sctp_setsockopt_auth_key, we create a temporary copy of the user
passed shared auth key for the endpoint or association and after
internal setup, we free it right away. Since it's sensitive data, we
should zero out the key before returning the memory back to the
allocator. Thus, use kzfree instead of kfree, just as we do in
sctp_auth_key_put().

Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/sctp/socket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 03daceb2d9a0..38c19d38f438 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3276,7 +3276,7 @@ static int sctp_setsockopt_auth_key(struct sock *sk,

ret = sctp_auth_set_key(sctp_sk(sk)->ep, asoc, authkey);
out:
- kfree(authkey);
+ kzfree(authkey);
return ret;
}

--
1.8.5.2

2014-02-05 20:03:59

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 014/213] ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET pending data

From: Hannes Frederic Sowa <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 8822b64a0fa64a5dd1dfcf837c5b0be83f8c05d1 upstream.

We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>] [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
[<ffffffff8159a9aa>] skb_push+0x3a/0x40
[<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
[<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
[<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
[<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
[<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
[<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
[<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
[<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
[<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
[<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Cc: Dave Jones <[email protected]>
Cc: YOSHIFUJI Hideaki <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[PG: The line "flowi6 *fl6 = &inet->cork.fl.u.ip6" was
"flowi *fl = &inet->cork.fl" in 2.6.34 kernel.]
Signed-off-by: Paul Gortmaker <[email protected]>
---
include/net/udp.h | 1 +
net/ipv4/udp.c | 3 ++-
net/ipv6/udp.c | 7 ++++++-
3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index b02f5d9f4d7e..acc0a1928c33 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -169,6 +169,7 @@ extern void udp_err(struct sk_buff *, u32);

extern int udp_sendmsg(struct kiocb *iocb, struct sock *sk,
struct msghdr *msg, size_t len);
+extern int udp_push_pending_frames(struct sock *sk);
extern void udp_flush_pending_frames(struct sock *sk);

extern int udp_rcv(struct sk_buff *skb);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 7f0a1ae0544b..5346abf0e095 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -705,7 +705,7 @@ static void udp4_hwcsum_outgoing(struct sock *sk, struct sk_buff *skb,
/*
* Push out all pending data as one UDP datagram. Socket is locked.
*/
-static int udp_push_pending_frames(struct sock *sk)
+int udp_push_pending_frames(struct sock *sk)
{
struct udp_sock *up = udp_sk(sk);
struct inet_sock *inet = inet_sk(sk);
@@ -767,6 +767,7 @@ out:
up->pending = 0;
return err;
}
+EXPORT_SYMBOL(udp_push_pending_frames);

int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index a1d3d32237dd..bf7702a5f0b6 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -872,11 +872,16 @@ static int udp_v6_push_pending_frames(struct sock *sk)
struct udphdr *uh;
struct udp_sock *up = udp_sk(sk);
struct inet_sock *inet = inet_sk(sk);
- struct flowi *fl = &inet->cork.fl;
+ struct flowi *fl;
int err = 0;
int is_udplite = IS_UDPLITE(sk);
__wsum csum = 0;

+ if (up->pending == AF_INET)
+ return udp_push_pending_frames(sk);
+
+ fl = &inet->cork.fl;
+
/* Grab the skbuff where UDP header space exists. */
if ((skb = skb_peek(&sk->sk_write_queue)) == NULL)
goto out;
--
1.8.5.2

2014-02-05 20:03:56

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 015/213] jbd/jbd2: validate sb->s_first in journal_get_superblock()

From: Eryu Guan <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 8762202dd0d6e46854f786bdb6fb3780a1625efe upstream.

I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
image has s_first = 0 in journal superblock, and the 0 is passed to
journal->j_head in journal_reset(), then to blocknr in
cleanup_journal_tail(), in the end the J_ASSERT failed.

So validate s_first after reading journal superblock from disk in
journal_get_superblock() to ensure s_first is valid.

The following script could reproduce it:

fstype=ext3
blocksize=1024
img=$fstype.img
offset=0
found=0
magic="c0 3b 39 98"

dd if=/dev/zero of=$img bs=1M count=8
mkfs -t $fstype -b $blocksize -F $img
filesize=`stat -c %s $img`
while [ $offset -lt $filesize ]
do
if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
echo "Found journal: $offset"
found=1
break
fi
offset=`echo "$offset+$blocksize" | bc`
done

if [ $found -ne 1 ];then
echo "Magic \"$magic\" not found"
exit 1
fi

dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1

mkdir -p ./mnt
mount -o loop $img ./mnt

Cc: Jan Kara <[email protected]>
Signed-off-by: Eryu Guan <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd/journal.c | 8 ++++++++
fs/jbd2/journal.c | 8 ++++++++
2 files changed, 16 insertions(+)

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index 70713d508a0f..62b66042e104 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -1078,6 +1078,14 @@ static int journal_get_superblock(journal_t *journal)
goto out;
}

+ if (be32_to_cpu(sb->s_first) == 0 ||
+ be32_to_cpu(sb->s_first) >= journal->j_maxlen) {
+ printk(KERN_WARNING
+ "JBD: Invalid start block of journal: %u\n",
+ be32_to_cpu(sb->s_first));
+ goto out;
+ }
+
return 0;

out:
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 7f16feab9917..21bf6cdaedfd 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1194,6 +1194,14 @@ static int journal_get_superblock(journal_t *journal)
goto out;
}

+ if (be32_to_cpu(sb->s_first) == 0 ||
+ be32_to_cpu(sb->s_first) >= journal->j_maxlen) {
+ printk(KERN_WARNING
+ "JBD2: Invalid start block of journal: %u\n",
+ be32_to_cpu(sb->s_first));
+ goto out;
+ }
+
return 0;

out:
--
1.8.5.2

2014-02-05 21:11:41

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 028/213] ax25: fix info leak via msg_name in ax25_recvmsg()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit ef3313e84acbf349caecae942ab3ab731471f1a1 upstream.

When msg_namelen is non-zero the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of struct
sockaddr_ax25 inserted by the compiler for alignment. Additionally the
msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is
not always filled up to this size.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Cc: Ralf Baechle <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/ax25/af_ax25.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 12c350c63717..d082b6db6b81 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1655,6 +1655,7 @@ static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
ax25_address src;
const unsigned char *mac = skb_mac_header(skb);

+ memset(sax, 0, sizeof(struct full_sockaddr_ax25));
ax25_addr_parse(mac + 1, skb->data - mac - 1, &src, NULL,
&digi, NULL, NULL);
sax->sax25_family = AF_AX25;
--
1.8.5.2

2014-02-05 21:12:02

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 016/213] crypto: ansi_cprng - Fix off by one error in non-block size request

From: Neil Horman <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 714b33d15130cbb5ab426456d4e3de842d6c5b8a upstream.

Stephan Mueller reported to me recently a error in random number generation in
the ansi cprng. If several small requests are made that are less than the
instances block size, the remainder for loop code doesn't increment
rand_data_valid in the last iteration, meaning that the last bytes in the
rand_data buffer gets reused on the subsequent smaller-than-a-block request for
random data.

The fix is pretty easy, just re-code the for loop to make sure that
rand_data_valid gets incremented appropriately

Signed-off-by: Neil Horman <[email protected]>
Reported-by: Stephan Mueller <[email protected]>
CC: Stephan Mueller <[email protected]>
CC: Petr Matousek <[email protected]>
CC: Herbert Xu <[email protected]>
CC: "David S. Miller" <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
crypto/ansi_cprng.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 2bc332142849..5c565d72d1cc 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -230,11 +230,11 @@ remainder:
*/
if (byte_count < DEFAULT_BLK_SZ) {
empty_rbuf:
- for (; ctx->rand_data_valid < DEFAULT_BLK_SZ;
- ctx->rand_data_valid++) {
+ while (ctx->rand_data_valid < DEFAULT_BLK_SZ) {
*ptr = ctx->rand_data[ctx->rand_data_valid];
ptr++;
byte_count--;
+ ctx->rand_data_valid++;
if (byte_count == 0)
goto done;
}
--
1.8.5.2

2014-02-05 20:03:52

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 019/213] HID: provide a helper for validating hid reports

From: Kees Cook <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 331415ff16a12147d57d5c953f3a961b7ede348b upstream.

Many drivers need to validate the characteristics of their HID report
during initialization to avoid misusing the reports. This adds a common
helper to perform validation of the report exisitng, the field existing,
and the expected number of values within the field.

Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Benjamin Tissoires <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
[PG: in 2.6.34 it is err_hid(), original baseline had hid_err().]
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/hid/hid-core.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/hid.h | 4 ++++
2 files changed, 62 insertions(+)

diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index 195e366ea18d..b71ff324bec9 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -808,6 +808,64 @@ static __inline__ int search(__s32 *array, __s32 value, unsigned n)
return -1;
}

+static const char * const hid_report_names[] = {
+ "HID_INPUT_REPORT",
+ "HID_OUTPUT_REPORT",
+ "HID_FEATURE_REPORT",
+};
+/**
+ * hid_validate_values - validate existing device report's value indexes
+ *
+ * @device: hid device
+ * @type: which report type to examine
+ * @id: which report ID to examine (0 for first)
+ * @field_index: which report field to examine
+ * @report_counts: expected number of values
+ *
+ * Validate the number of values in a given field of a given report, after
+ * parsing.
+ */
+struct hid_report *hid_validate_values(struct hid_device *hid,
+ unsigned int type, unsigned int id,
+ unsigned int field_index,
+ unsigned int report_counts)
+{
+ struct hid_report *report;
+
+ if (type > HID_FEATURE_REPORT) {
+ err_hid("invalid HID report type %u\n", type);
+ return NULL;
+ }
+
+ if (id >= HID_MAX_IDS) {
+ err_hid("invalid HID report id %u\n", id);
+ return NULL;
+ }
+
+ /*
+ * Explicitly not using hid_get_report() here since it depends on
+ * ->numbered being checked, which may not always be the case when
+ * drivers go to access report values.
+ */
+ report = hid->report_enum[type].report_id_hash[id];
+ if (!report) {
+ err_hid("missing %s %u\n", hid_report_names[type], id);
+ return NULL;
+ }
+ if (report->maxfield <= field_index) {
+ err_hid("not enough fields in %s %u\n",
+ hid_report_names[type], id);
+ return NULL;
+ }
+ if (report->field[field_index]->report_count < report_counts) {
+ err_hid("not enough values in %s %u field %u\n",
+ hid_report_names[type], id, field_index);
+ return NULL;
+ }
+ return report;
+}
+EXPORT_SYMBOL_GPL(hid_validate_values);
+
/**
* hid_match_report - check if driver's raw_event should be called
*
diff --git a/include/linux/hid.h b/include/linux/hid.h
index 85e0942cfd76..cd7049a670bc 100644
--- a/include/linux/hid.h
+++ b/include/linux/hid.h
@@ -694,6 +694,10 @@ void hid_output_report(struct hid_report *report, __u8 *data);
struct hid_device *hid_allocate_device(void);
struct hid_report *hid_register_report(struct hid_device *device, unsigned type, unsigned id);
int hid_parse_report(struct hid_device *hid, __u8 *start, unsigned size);
+struct hid_report *hid_validate_values(struct hid_device *hid,
+ unsigned int type, unsigned int id,
+ unsigned int field_index,
+ unsigned int report_counts);
int hid_check_keys_pressed(struct hid_device *hid);
int hid_connect(struct hid_device *hid, unsigned int connect_mask);
void hid_disconnect(struct hid_device *hid);
--
1.8.5.2

2014-02-05 21:12:25

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 031/213] atm: update msg_namelen in vcc_recvmsg()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 9b3e617f3df53822345a8573b6d358f6b9e5ed87 upstream.

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about vcc_recvmsg() not filling the msg_name in case it was set.

Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/atm/common.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/atm/common.c b/net/atm/common.c
index ec2cb8f3c8d9..4ccc872209f0 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -475,6 +475,8 @@ int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
struct sk_buff *skb;
int copied, error = -EINVAL;

+ msg->msg_namelen = 0;
+
if (sock->state != SS_CONNECTED)
return -ENOTCONN;
if (flags & ~MSG_DONTWAIT) /* only handle MSG_DONTWAIT */
--
1.8.5.2

2014-02-05 21:12:54

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 026/213] iucv: Fix missing msg_namelen update in iucv_sock_recvmsg()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit a5598bd9c087dc0efc250a5221e5d0e6f584ee88 upstream.

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about iucv_sock_recvmsg() not filling the msg_name in case it was set.

Cc: Ursula Braun <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/iucv/af_iucv.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index c18286a2167b..1e3b2ec74622 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1160,6 +1160,8 @@ static int iucv_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
struct sk_buff *skb, *rskb, *cskb;
int err = 0;

+ msg->msg_namelen = 0;
+
if ((sk->sk_state == IUCV_DISCONN || sk->sk_state == IUCV_SEVERED) &&
skb_queue_empty(&iucv->backlog_skb_q) &&
skb_queue_empty(&sk->sk_receive_queue) &&
--
1.8.5.2

2014-02-05 21:12:51

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 030/213] atm: fix info leak via getsockname()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 3c0c5cfdcd4d69ffc4b9c0907cec99039f30a50a upstream.

The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/atm/pvc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index 437ee70c5e62..db0dd47de61b 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -94,6 +94,7 @@ static int pvc_getname(struct socket *sock, struct sockaddr *sockaddr,
return -ENOTCONN;
*sockaddr_len = sizeof(struct sockaddr_atmpvc);
addr = (struct sockaddr_atmpvc *)sockaddr;
+ memset(addr, 0, sizeof(*addr));
addr->sap_family = AF_ATMPVC;
addr->sap_addr.itf = vcc->dev->number;
addr->sap_addr.vpi = vcc->vpi;
--
1.8.5.2

2014-02-05 21:13:36

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 025/213] llc: Fix missing msg_namelen update in llc_ui_recvmsg()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit c77a4b9cffb6215a15196ec499490d116dfad181 upstream.

For stream sockets the code misses to update the msg_namelen member
to 0 and therefore makes net/socket.c leak the local, uninitialized
sockaddr_storage variable to userland -- 128 bytes of kernel stack
memory. The msg_namelen update is also missing for datagram sockets
in case the socket is shutting down during receive.

Fix both issues by setting msg_namelen to 0 early. It will be
updated later if we're going to fill the msg_name member.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/llc/af_llc.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 06010e1e89f9..121c92e3c128 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -719,6 +719,8 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
int target; /* Read at least this many bytes */
long timeo;

+ msg->msg_namelen = 0;
+
lock_sock(sk);
copied = -ENOTCONN;
if (unlikely(sk->sk_type == SOCK_STREAM && sk->sk_state == TCP_LISTEN))
--
1.8.5.2

2014-02-05 21:13:32

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 029/213] atm: fix info leak in getsockopt(SO_ATMPVC)

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit e862f1a9b7df4e8196ebec45ac62295138aa3fc2 upstream.

The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/atm/common.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/atm/common.c b/net/atm/common.c
index 97ed94aa0cbc..ec2cb8f3c8d9 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -759,6 +759,7 @@ int vcc_getsockopt(struct socket *sock, int level, int optname,

if (!vcc->dev || !test_bit(ATM_VF_ADDR, &vcc->flags))
return -ENOTCONN;
+ memset(&pvc, 0, sizeof(pvc));
pvc.sap_family = AF_ATMPVC;
pvc.sap_addr.itf = vcc->dev->number;
pvc.sap_addr.vpi = vcc->vpi;
--
1.8.5.2

2014-02-05 21:14:09

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 022/213] rose: fix info leak via msg_name in rose_recvmsg()

From: Mathias Krause <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 4a184233f21645cf0b719366210ed445d1024d72 upstream.

The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.

Fix the issue by initializing the memory used for sockaddr info with
memset(0).

Cc: Ralf Baechle <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/rose/af_rose.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 547e5cd6d375..11072546d7e9 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1277,6 +1277,7 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);

if (srose != NULL) {
+ memset(srose, 0, msg->msg_namelen);
srose->srose_family = AF_ROSE;
srose->srose_addr = rose->dest_addr;
srose->srose_call = rose->dest_call;
--
1.8.5.2

2014-02-05 21:14:33

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 011/213] block: fail SCSI passthrough ioctls on partition devices

From: Paolo Bonzini <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 0bfc96cb77224736dfa35c3c555d37b3646ef35e upstream.

Linux allows executing the SG_IO ioctl on a partition or LVM volume, and
will pass the command to the underlying block device. This is
well-known, but it is also a large security problem when (via Unix
permissions, ACLs, SELinux or a combination thereof) a program or user
needs to be granted access only to part of the disk.

This patch lets partitions forward a small set of harmless ioctls;
others are logged with printk so that we can see which ioctls are
actually sent. In my tests only CDROM_GET_CAPABILITY actually occurred.
Of course it was being sent to a (partition on a) hard disk, so it would
have failed with ENOTTY and the patch isn't changing anything in
practice. Still, I'm treating it specially to avoid spamming the logs.

In principle, this restriction should include programs running with
CAP_SYS_RAWIO. If for example I let a program access /dev/sda2 and
/dev/sdb, it still should not be able to read/write outside the
boundaries of /dev/sda2 independent of the capabilities. However, for
now programs with CAP_SYS_RAWIO will still be allowed to send the
ioctls. Their actions will still be logged.

This patch does not affect the non-libata IDE driver. That driver
however already tests for bd != bd->bd_contains before issuing some
ioctl; it could be restricted further to forbid these ioctls even for
programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO.

Cc: [email protected]
Cc: Jens Axboe <[email protected]>
Cc: James Bottomley <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
[ Make it also print the command name when warning - Linus ]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
block/scsi_ioctl.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
drivers/scsi/sd.c | 11 +++++++++--
include/linux/blkdev.h | 1 +
3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 57ac93754841..b661f8940ef5 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -24,6 +24,7 @@
#include <linux/capability.h>
#include <linux/completion.h>
#include <linux/cdrom.h>
+#include <linux/ratelimit.h>
#include <linux/slab.h>
#include <linux/times.h>
#include <asm/uaccess.h>
@@ -691,9 +692,53 @@ int scsi_cmd_ioctl(struct request_queue *q, struct gendisk *bd_disk, fmode_t mod
}
EXPORT_SYMBOL(scsi_cmd_ioctl);

+int scsi_verify_blk_ioctl(struct block_device *bd, unsigned int cmd)
+{
+ if (bd && bd == bd->bd_contains)
+ return 0;
+
+ /* Actually none of these is particularly useful on a partition,
+ * but they are safe.
+ */
+ switch (cmd) {
+ case SCSI_IOCTL_GET_IDLUN:
+ case SCSI_IOCTL_GET_BUS_NUMBER:
+ case SCSI_IOCTL_GET_PCI:
+ case SCSI_IOCTL_PROBE_HOST:
+ case SG_GET_VERSION_NUM:
+ case SG_SET_TIMEOUT:
+ case SG_GET_TIMEOUT:
+ case SG_GET_RESERVED_SIZE:
+ case SG_SET_RESERVED_SIZE:
+ case SG_EMULATED_HOST:
+ return 0;
+ case CDROM_GET_CAPABILITY:
+ /* Keep this until we remove the printk below. udev sends it
+ * and we do not want to spam dmesg about it. CD-ROMs do
+ * not have partitions, so we get here only for disks.
+ */
+ return -ENOIOCTLCMD;
+ default:
+ break;
+ }
+
+ /* In particular, rule out all resets and host-specific ioctls. */
+ printk_ratelimited(KERN_WARNING
+ "%s: sending ioctl %x to a partition!\n", current->comm, cmd);
+
+ return capable(CAP_SYS_RAWIO) ? 0 : -ENOIOCTLCMD;
+}
+EXPORT_SYMBOL(scsi_verify_blk_ioctl);
+
int scsi_cmd_blk_ioctl(struct block_device *bd, fmode_t mode,
unsigned int cmd, void __user *arg)
{
+ int ret;
+
+ ret = scsi_verify_blk_ioctl(bd, cmd);
+ if (ret < 0)
+ return ret;
+
return scsi_cmd_ioctl(bd->bd_disk->queue, bd->bd_disk, mode, cmd, arg);
}
EXPORT_SYMBOL(scsi_cmd_blk_ioctl);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 654e2674e7c3..1b543a229487 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -886,6 +886,10 @@ static int sd_ioctl(struct block_device *bdev, fmode_t mode,
SCSI_LOG_IOCTL(1, printk("sd_ioctl: disk=%s, cmd=0x%x\n",
disk->disk_name, cmd));

+ error = scsi_verify_blk_ioctl(bdev, cmd);
+ if (error < 0)
+ return error;
+
/*
* If we are in the middle of error recovery, don't let anyone
* else try and use this device. Also, if error recovery fails, it
@@ -1065,6 +1069,11 @@ static int sd_compat_ioctl(struct block_device *bdev, fmode_t mode,
unsigned int cmd, unsigned long arg)
{
struct scsi_device *sdev = scsi_disk(bdev->bd_disk)->device;
+ int ret;
+
+ ret = scsi_verify_blk_ioctl(bdev, cmd);
+ if (ret < 0)
+ return ret;

/*
* If we are in the middle of error recovery, don't let anyone
@@ -1076,8 +1085,6 @@ static int sd_compat_ioctl(struct block_device *bdev, fmode_t mode,
return -ENODEV;

if (sdev->host->hostt->compat_ioctl) {
- int ret;
-
ret = sdev->host->hostt->compat_ioctl(sdev, cmd, (void __user *)arg);

return ret;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ba55e497f7dc..22713e8385f0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -793,6 +793,7 @@ extern void blk_plug_device(struct request_queue *);
extern void blk_plug_device_unlocked(struct request_queue *);
extern int blk_remove_plug(struct request_queue *);
extern void blk_recount_segments(struct request_queue *, struct bio *);
+extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int);
extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
unsigned int, void __user *);
extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
--
1.8.5.2

2014-02-05 21:14:30

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 027/213] isdnloop: fix and simplify isdnloop_init()

From: Wu Fengguang <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 77f00f6324cb97cf1df6f9c4aaeea6ada23abdb2 upstream.

Fix a buffer overflow bug by removing the revision and printk.

[ 22.016214] isdnloop-ISDN-driver Rev 1.11.6.7
[ 22.097508] isdnloop: (loop0) virtual card added
[ 22.174400] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff83244972
[ 22.174400]
[ 22.436157] Pid: 1, comm: swapper Not tainted 3.5.0-bisect-00018-gfa8bbb1-dirty #129
[ 22.624071] Call Trace:
[ 22.720558] [<ffffffff832448c3>] ? CallcNew+0x56/0x56
[ 22.815248] [<ffffffff8222b623>] panic+0x110/0x329
[ 22.914330] [<ffffffff83244972>] ? isdnloop_init+0xaf/0xb1
[ 23.014800] [<ffffffff832448c3>] ? CallcNew+0x56/0x56
[ 23.090763] [<ffffffff8108e24b>] __stack_chk_fail+0x2b/0x30
[ 23.185748] [<ffffffff83244972>] isdnloop_init+0xaf/0xb1

Signed-off-by: Fengguang Wu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/isdn/isdnloop/isdnloop.c | 12 ------------
1 file changed, 12 deletions(-)

diff --git a/drivers/isdn/isdnloop/isdnloop.c b/drivers/isdn/isdnloop/isdnloop.c
index b8a1098b66ed..385279b2d54c 100644
--- a/drivers/isdn/isdnloop/isdnloop.c
+++ b/drivers/isdn/isdnloop/isdnloop.c
@@ -16,7 +16,6 @@
#include <linux/sched.h>
#include "isdnloop.h"

-static char *revision = "$Revision: 1.11.6.7 $";
static char *isdnloop_id = "loop0";

MODULE_DESCRIPTION("ISDN4Linux: Pseudo Driver that simulates an ISDN card");
@@ -1494,17 +1493,6 @@ isdnloop_addcard(char *id1)
static int __init
isdnloop_init(void)
{
- char *p;
- char rev[10];
-
- if ((p = strchr(revision, ':'))) {
- strcpy(rev, p + 1);
- p = strchr(rev, '$');
- *p = 0;
- } else
- strcpy(rev, " ??? ");
- printk(KERN_NOTICE "isdnloop-ISDN-driver Rev%s\n", rev);
-
if (isdnloop_id)
return (isdnloop_addcard(isdnloop_id));

--
1.8.5.2

2014-02-05 20:03:45

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 018/213] HID: pantherlord: validate output report details

From: Kees Cook <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 412f30105ec6735224535791eed5cdc02888ecb4 upstream.

A HID device could send a malicious output report that would cause the
pantherlord HID driver to write beyond the output report allocation
during initialization, causing a heap overflow:

[ 310.939483] usb 1-1: New USB device found, idVendor=0e8f, idProduct=0003
...
[ 315.980774] BUG kmalloc-192 (Tainted: G W ): Redzone overwritten

CVE-2013-2892

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/hid/hid-pl.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/hid/hid-pl.c b/drivers/hid/hid-pl.c
index 9f41e2bd8483..427f3a57aa23 100644
--- a/drivers/hid/hid-pl.c
+++ b/drivers/hid/hid-pl.c
@@ -129,8 +129,14 @@ static int plff_init(struct hid_device *hid)
strong = &report->field[0]->value[2];
weak = &report->field[0]->value[3];
debug("detected single-field device");
- } else if (report->maxfield >= 4 && report->field[0]->maxusage == 1 &&
- report->field[0]->usage[0].hid == (HID_UP_LED | 0x43)) {
+ } else if (report->field[0]->maxusage == 1 &&
+ report->field[0]->usage[0].hid ==
+ (HID_UP_LED | 0x43) &&
+ report->maxfield >= 4 &&
+ report->field[0]->report_count >= 1 &&
+ report->field[1]->report_count >= 1 &&
+ report->field[2]->report_count >= 1 &&
+ report->field[3]->report_count >= 1) {
report->field[0]->value[0] = 0x00;
report->field[1]->value[0] = 0x00;
strong = &report->field[2]->value[0];
--
1.8.5.2

2014-02-05 21:15:29

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 023/213] rds: set correct msg_namelen

From: Weiping Pan <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 06b6a1cf6e776426766298d055bb3991957d90a7 upstream.

Jay Fenlason ([email protected]) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.

And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.

How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.

1 compile
gcc -o rds_client rds_client.c
gcc -o rds_server rds_server.c

2 run ./rds_server on one console

3 run ./rds_client on another console

4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor

/***************** rds_client.c ********************/

int main(void)
{
int sock_fd;
struct sockaddr_in serverAddr;
struct sockaddr_in toAddr;
char recvBuffer[128] = "data from client";
struct msghdr msg;
struct iovec iov;

sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if (sock_fd < 0) {
perror("create socket error\n");
exit(1);
}

memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4001);

if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind() error\n");
close(sock_fd);
exit(1);
}

memset(&toAddr, 0, sizeof(toAddr));
toAddr.sin_family = AF_INET;
toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
toAddr.sin_port = htons(4000);
msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;

if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendto() error\n");
close(sock_fd);
exit(1);
}

printf("client send data:%s\n", recvBuffer);

memset(recvBuffer, '\0', 128);

msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}

printf("receive data from server:%s\n", recvBuffer);

close(sock_fd);

return 0;
}

/***************** rds_server.c ********************/

int main(void)
{
struct sockaddr_in fromAddr;
int sock_fd;
struct sockaddr_in serverAddr;
unsigned int addrLen;
char recvBuffer[128];
struct msghdr msg;
struct iovec iov;

sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if(sock_fd < 0) {
perror("create socket error\n");
exit(0);
}

memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4000);
if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind error\n");
close(sock_fd);
exit(1);
}

printf("server is waiting to receive data...\n");
msg.msg_name = &fromAddr;

/*
* I add 16 to sizeof(fromAddr), ie 32,
* and pay attention to the definition of fromAddr,
* recvmsg() will overwrite sock_fd,
* since kernel will copy 32 bytes to userspace.
*
* If you just use sizeof(fromAddr), it works fine.
* */
msg.msg_namelen = sizeof(fromAddr) + 16;
/* msg.msg_namelen = sizeof(fromAddr); */
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;

while (1) {
printf("old socket fd=%d\n", sock_fd);
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}
printf("server received data from client:%s\n", recvBuffer);
printf("msg.msg_namelen=%d\n", msg.msg_namelen);
printf("new socket fd=%d\n", sock_fd);
strcat(recvBuffer, "--data from server");
if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendmsg()\n");
close(sock_fd);
exit(1);
}
}

close(sock_fd);
return 0;
}

Signed-off-by: Weiping Pan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/rds/recv.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/rds/recv.c b/net/rds/recv.c
index 93aadc0173cb..de4d79c56662 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -411,6 +411,8 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,

rdsdebug("size %zu flags 0x%x timeo %ld\n", size, msg_flags, timeo);

+ msg->msg_namelen = 0;
+
if (msg_flags & MSG_OOB)
goto out;

@@ -486,6 +488,7 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
sin->sin_port = inc->i_hdr.h_sport;
sin->sin_addr.s_addr = inc->i_saddr;
memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
+ msg->msg_namelen = sizeof(*sin);
}
break;
}
--
1.8.5.2

2014-02-05 21:15:59

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 021/213] HID: LG: validate HID output report details

From: Kees Cook <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 0fb6bd06e06792469acc15bbe427361b56ada528 upstream.

A HID device could send a malicious output report that would cause the
lg, lg3, and lg4 HID drivers to write beyond the output report allocation
during an event, causing a heap overflow:

[ 325.245240] usb 1-1: New USB device found, idVendor=046d, idProduct=c287
...
[ 414.518960] BUG kmalloc-4096 (Not tainted): Redzone overwritten

Additionally, while lg2 did correctly validate the report details, it was
cleaned up and shortened.

CVE-2013-2893

Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Benjamin Tissoires <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
[PG: drop hid-lg4ff.c chunk; file not present in 2.6.34 baseline.]
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/hid/hid-lg2ff.c | 19 +++----------------
drivers/hid/hid-lg3ff.c | 29 ++++++-----------------------
drivers/hid/hid-lgff.c | 17 ++---------------
3 files changed, 11 insertions(+), 54 deletions(-)

diff --git a/drivers/hid/hid-lg2ff.c b/drivers/hid/hid-lg2ff.c
index d888f1e6794f..2b8109d7fc6a 100644
--- a/drivers/hid/hid-lg2ff.c
+++ b/drivers/hid/hid-lg2ff.c
@@ -66,26 +66,13 @@ int lg2ff_init(struct hid_device *hid)
struct hid_report *report;
struct hid_input *hidinput = list_entry(hid->inputs.next,
struct hid_input, list);
- struct list_head *report_list =
- &hid->report_enum[HID_OUTPUT_REPORT].report_list;
struct input_dev *dev = hidinput->input;
int error;

- if (list_empty(report_list)) {
- dev_err(&hid->dev, "no output report found\n");
+ /* Check that the report looks ok */
+ report = hid_validate_values(hid, HID_OUTPUT_REPORT, 0, 0, 7);
+ if (!report)
return -ENODEV;
- }
-
- report = list_entry(report_list->next, struct hid_report, list);
-
- if (report->maxfield < 1) {
- dev_err(&hid->dev, "output report is empty\n");
- return -ENODEV;
- }
- if (report->field[0]->report_count < 7) {
- dev_err(&hid->dev, "not enough values in the field\n");
- return -ENODEV;
- }

lg2ff = kmalloc(sizeof(struct lg2ff_device), GFP_KERNEL);
if (!lg2ff)
diff --git a/drivers/hid/hid-lg3ff.c b/drivers/hid/hid-lg3ff.c
index 4002832ee4af..b998b1675f62 100644
--- a/drivers/hid/hid-lg3ff.c
+++ b/drivers/hid/hid-lg3ff.c
@@ -68,10 +68,11 @@ static int hid_lg3ff_play(struct input_dev *dev, void *data,
int x, y;

/*
- * Maxusage should always be 63 (maximum fields)
- * likely a better way to ensure this data is clean
+ * Available values in the field should always be 63, but we only use up to
+ * 35. Instead, clear the entire area, however big it is.
*/
- memset(report->field[0]->value, 0, sizeof(__s32)*report->field[0]->maxusage);
+ memset(report->field[0]->value, 0,
+ sizeof(__s32) * report->field[0]->report_count);

switch (effect->type) {
case FF_CONSTANT:
@@ -131,32 +132,14 @@ static const signed short ff3_joystick_ac[] = {
int lg3ff_init(struct hid_device *hid)
{
struct hid_input *hidinput = list_entry(hid->inputs.next, struct hid_input, list);
- struct list_head *report_list = &hid->report_enum[HID_OUTPUT_REPORT].report_list;
struct input_dev *dev = hidinput->input;
- struct hid_report *report;
- struct hid_field *field;
const signed short *ff_bits = ff3_joystick_ac;
int error;
int i;

- /* Find the report to use */
- if (list_empty(report_list)) {
- err_hid("No output report found");
- return -1;
- }
-
/* Check that the report looks ok */
- report = list_entry(report_list->next, struct hid_report, list);
- if (!report) {
- err_hid("NULL output report");
- return -1;
- }
-
- field = report->field[0];
- if (!field) {
- err_hid("NULL field");
- return -1;
- }
+ if (!hid_validate_values(hid, HID_OUTPUT_REPORT, 0, 0, 35))
+ return -ENODEV;

/* Assume single fixed device G940 */
for (i = 0; ff_bits[i] >= 0; i++)
diff --git a/drivers/hid/hid-lgff.c b/drivers/hid/hid-lgff.c
index 61142b76a9b1..60978a33eeb6 100644
--- a/drivers/hid/hid-lgff.c
+++ b/drivers/hid/hid-lgff.c
@@ -136,27 +136,14 @@ static void hid_lgff_set_autocenter(struct input_dev *dev, u16 magnitude)
int lgff_init(struct hid_device* hid)
{
struct hid_input *hidinput = list_entry(hid->inputs.next, struct hid_input, list);
- struct list_head *report_list = &hid->report_enum[HID_OUTPUT_REPORT].report_list;
struct input_dev *dev = hidinput->input;
- struct hid_report *report;
- struct hid_field *field;
const signed short *ff_bits = ff_joystick;
int error;
int i;

- /* Find the report to use */
- if (list_empty(report_list)) {
- err_hid("No output report found");
- return -1;
- }
-
/* Check that the report looks ok */
- report = list_entry(report_list->next, struct hid_report, list);
- field = report->field[0];
- if (!field) {
- err_hid("NULL field");
- return -1;
- }
+ if (!hid_validate_values(hid, HID_OUTPUT_REPORT, 0, 0, 7))
+ return -ENODEV;

for (i = 0; i < ARRAY_SIZE(devices); i++) {
if (dev->id.vendor == devices[i].idVendor &&
--
1.8.5.2

2014-02-05 21:16:10

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 020/213] HID: zeroplus: validate output report details

From: Kees Cook <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 78214e81a1bf43740ce89bb5efda78eac2f8ef83 upstream.

The zeroplus HID driver was not checking the size of allocated values
in fields it used. A HID device could send a malicious output report
that would cause the driver to write beyond the output report allocation
during initialization, causing a heap overflow:

[ 1442.728680] usb 1-1: New USB device found, idVendor=0c12, idProduct=0005
...
[ 1466.243173] BUG kmalloc-192 (Tainted: G W ): Redzone overwritten

CVE-2013-2889

Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Benjamin Tissoires <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/hid/hid-zpff.c | 18 +++++-------------
1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/hid/hid-zpff.c b/drivers/hid/hid-zpff.c
index b7acceabba80..a24f9fb43c34 100644
--- a/drivers/hid/hid-zpff.c
+++ b/drivers/hid/hid-zpff.c
@@ -69,21 +69,13 @@ static int zpff_init(struct hid_device *hid)
struct hid_report *report;
struct hid_input *hidinput = list_entry(hid->inputs.next,
struct hid_input, list);
- struct list_head *report_list =
- &hid->report_enum[HID_OUTPUT_REPORT].report_list;
struct input_dev *dev = hidinput->input;
- int error;
+ int i, error;

- if (list_empty(report_list)) {
- dev_err(&hid->dev, "no output report found\n");
- return -ENODEV;
- }
-
- report = list_entry(report_list->next, struct hid_report, list);
-
- if (report->maxfield < 4) {
- dev_err(&hid->dev, "not enough fields in report\n");
- return -ENODEV;
+ for (i = 0; i < 4; i++) {
+ report = hid_validate_values(hid, HID_OUTPUT_REPORT, 0, i, 1);
+ if (!report)
+ return -ENODEV;
}

zpff = kzalloc(sizeof(struct zpff_device), GFP_KERNEL);
--
1.8.5.2

2014-02-05 20:03:32

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 003/213] Revert "percpu: fix chunk range calculation"

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

This reverts commit 264266e6897dd81c894d1c5cbd90b133707b32f3.

The backport had dependencies on other mm/percpu.c restructurings,
like those in commit 020ec6537aa65c18e9084c568d7b94727f2026fd
("percpu: factor out pcpu_addr_in_first/reserved_chunk() and update
per_cpu_ptr_to_phys()"). Rather than drag in more changes, we simply
revert the incomplete backport.

Reported-by: George G. Davis <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
mm/percpu.c | 46 ++++++++++++++++++++--------------------------
1 file changed, 20 insertions(+), 26 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 83523d9a351b..558543b33b52 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -111,9 +111,9 @@ static int pcpu_atom_size __read_mostly;
static int pcpu_nr_slots __read_mostly;
static size_t pcpu_chunk_struct_size __read_mostly;

-/* cpus with the lowest and highest unit addresses */
-static unsigned int pcpu_low_unit_cpu __read_mostly;
-static unsigned int pcpu_high_unit_cpu __read_mostly;
+/* cpus with the lowest and highest unit numbers */
+static unsigned int pcpu_first_unit_cpu __read_mostly;
+static unsigned int pcpu_last_unit_cpu __read_mostly;

/* the address of the first chunk which starts with the kernel static area */
void *pcpu_base_addr __read_mostly;
@@ -747,8 +747,8 @@ static void pcpu_pre_unmap_flush(struct pcpu_chunk *chunk,
int page_start, int page_end)
{
flush_cache_vunmap(
- pcpu_chunk_addr(chunk, pcpu_low_unit_cpu, page_start),
- pcpu_chunk_addr(chunk, pcpu_high_unit_cpu, page_end));
+ pcpu_chunk_addr(chunk, pcpu_first_unit_cpu, page_start),
+ pcpu_chunk_addr(chunk, pcpu_last_unit_cpu, page_end));
}

static void __pcpu_unmap_pages(unsigned long addr, int nr_pages)
@@ -810,8 +810,8 @@ static void pcpu_post_unmap_tlb_flush(struct pcpu_chunk *chunk,
int page_start, int page_end)
{
flush_tlb_kernel_range(
- pcpu_chunk_addr(chunk, pcpu_low_unit_cpu, page_start),
- pcpu_chunk_addr(chunk, pcpu_high_unit_cpu, page_end));
+ pcpu_chunk_addr(chunk, pcpu_first_unit_cpu, page_start),
+ pcpu_chunk_addr(chunk, pcpu_last_unit_cpu, page_end));
}

static int __pcpu_map_pages(unsigned long addr, struct page **pages,
@@ -888,8 +888,8 @@ static void pcpu_post_map_flush(struct pcpu_chunk *chunk,
int page_start, int page_end)
{
flush_cache_vmap(
- pcpu_chunk_addr(chunk, pcpu_low_unit_cpu, page_start),
- pcpu_chunk_addr(chunk, pcpu_high_unit_cpu, page_end));
+ pcpu_chunk_addr(chunk, pcpu_first_unit_cpu, page_start),
+ pcpu_chunk_addr(chunk, pcpu_last_unit_cpu, page_end));
}

/**
@@ -1345,19 +1345,19 @@ phys_addr_t per_cpu_ptr_to_phys(void *addr)
{
void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr);
bool in_first_chunk = false;
- unsigned long first_low, first_high;
+ unsigned long first_start, first_end;
unsigned int cpu;

/*
- * The following test on unit_low/high isn't strictly
+ * The following test on first_start/end isn't strictly
* necessary but will speed up lookups of addresses which
* aren't in the first chunk.
*/
- first_low = pcpu_chunk_addr(pcpu_first_chunk, pcpu_low_unit_cpu, 0);
- first_high = pcpu_chunk_addr(pcpu_first_chunk, pcpu_high_unit_cpu,
- pcpu_unit_pages);
- if ((unsigned long)addr >= first_low &&
- (unsigned long)addr < first_high) {
+ first_start = pcpu_chunk_addr(pcpu_first_chunk, pcpu_first_unit_cpu, 0);
+ first_end = pcpu_chunk_addr(pcpu_first_chunk, pcpu_last_unit_cpu,
+ pcpu_unit_pages);
+ if ((unsigned long)addr >= first_start &&
+ (unsigned long)addr < first_end) {
for_each_possible_cpu(cpu) {
void *start = per_cpu_ptr(base, cpu);

@@ -1754,9 +1754,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,

for (cpu = 0; cpu < nr_cpu_ids; cpu++)
unit_map[cpu] = UINT_MAX;
-
- pcpu_low_unit_cpu = NR_CPUS;
- pcpu_high_unit_cpu = NR_CPUS;
+ pcpu_first_unit_cpu = NR_CPUS;

for (group = 0, unit = 0; group < ai->nr_groups; group++, unit += i) {
const struct pcpu_group_info *gi = &ai->groups[group];
@@ -1776,13 +1774,9 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
unit_map[cpu] = unit + i;
unit_off[cpu] = gi->base_offset + i * ai->unit_size;

- /* determine low/high unit_cpu */
- if (pcpu_low_unit_cpu == NR_CPUS ||
- unit_off[cpu] < unit_off[pcpu_low_unit_cpu])
- pcpu_low_unit_cpu = cpu;
- if (pcpu_high_unit_cpu == NR_CPUS ||
- unit_off[cpu] > unit_off[pcpu_high_unit_cpu])
- pcpu_high_unit_cpu = cpu;
+ if (pcpu_first_unit_cpu == NR_CPUS)
+ pcpu_first_unit_cpu = cpu;
+ pcpu_last_unit_cpu = cpu;
}
}
pcpu_nr_units = unit;
--
1.8.5.2

2014-02-05 21:16:49

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 012/213] dm: do not forward ioctls from logical volumes to the underlying device

From: Paolo Bonzini <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit ec8013beddd717d1740cfefb1a9b900deef85462 upstream.

A logical volume can map to just part of underlying physical volume.
In this case, it must be treated like a partition.

Based on a patch from Alasdair G Kergon.

Cc: Alasdair G Kergon <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
[PG: drop drivers/md/dm-flakey.c chunk; file not present in 2.6.34]
Signed-off-by: Paul Gortmaker <[email protected]>
---
drivers/md/dm-linear.c | 12 +++++++++++-
drivers/md/dm-mpath.c | 6 ++++++
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 9200dbf2391a..bf404845aa9d 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -115,7 +115,17 @@ static int linear_ioctl(struct dm_target *ti, unsigned int cmd,
unsigned long arg)
{
struct linear_c *lc = (struct linear_c *) ti->private;
- return __blkdev_driver_ioctl(lc->dev->bdev, lc->dev->mode, cmd, arg);
+ struct dm_dev *dev = lc->dev;
+ int r = 0;
+
+ /*
+ * Only pass ioctls through if the device sizes match exactly.
+ */
+ if (lc->start ||
+ ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
+ r = scsi_verify_blk_ioctl(NULL, cmd);
+
+ return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg);
}

static int linear_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 78090ebaba6c..3c06fdb13868 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1541,6 +1541,12 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd,

spin_unlock_irqrestore(&m->lock, flags);

+ /*
+ * Only pass ioctls through if the device sizes match exactly.
+ */
+ if (!r && ti->len != i_size_read(bdev->bd_inode) >> SECTOR_SHIFT)
+ r = scsi_verify_blk_ioctl(NULL, cmd);
+
return r ? : __blkdev_driver_ioctl(bdev, mode, cmd, arg);
}

--
1.8.5.2

2014-02-05 20:03:27

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 002/213] udf: fix udf_error build warnings

From: "George G. Davis" <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

The v2.6.34.14 commit "d7542a6 udf: Avoid run away loop when partition
table length is corrupted" introduced the following build warning due
to a change in the number of args in udf_error/udf_err for v2.6.34.14:

CC fs/udf/super.o
fs/udf/super.c: In function 'udf_load_sparable_map':
fs/udf/super.c:1259: warning: passing argument 3 of 'udf_error' makes pointer from integer without a cast
fs/udf/super.c:1265: warning: passing argument 3 of 'udf_error' makes pointer from integer without a cast
fs/udf/super.c: In function 'udf_load_logicalvol':
fs/udf/super.c:1313: warning: passing argument 3 of 'udf_error' makes pointer from integer without a cast

The above warnings are due to a missing __func__ argument in the above
udf_error function calls. This is because of commit 8076c363da15e7
("udf: Rename udf_error to udf_err") which removed the __func__ arg.

Restore the missing __func__ argument to fix the build warnings.

Signed-off-by: George G. Davis <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/udf/super.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/udf/super.c b/fs/udf/super.c
index 1d36fdd4ae56..5ece6d6721f8 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -1254,13 +1254,15 @@ static int udf_load_sparable_map(struct super_block *sb,
map->s_partition_type = UDF_SPARABLE_MAP15;
sdata->s_packet_len = le16_to_cpu(spm->packetLength);
if (!is_power_of_2(sdata->s_packet_len)) {
- udf_error(sb, "error loading logical volume descriptor: "
+ udf_error(sb, __func__,
+ "error loading logical volume descriptor: "
"Invalid packet length %u\n",
(unsigned)sdata->s_packet_len);
return -EIO;
}
if (spm->numSparingTables > 4) {
- udf_error(sb, "error loading logical volume descriptor: "
+ udf_error(sb, __func__,
+ "error loading logical volume descriptor: "
"Too many sparing tables (%d)\n",
(int)spm->numSparingTables);
return -EIO;
@@ -1308,7 +1310,8 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
lvd = (struct logicalVolDesc *)bh->b_data;
table_len = le32_to_cpu(lvd->mapTableLength);
if (table_len > sb->s_blocksize - sizeof(*lvd)) {
- udf_error(sb, "error loading logical volume descriptor: "
+ udf_error(sb, __func__,
+ "error loading logical volume descriptor: "
"Partition table too long (%u > %lu)\n", table_len,
sb->s_blocksize - sizeof(*lvd));
goto out_bh;
--
1.8.5.2

2014-02-05 21:17:09

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 009/213] bridge: Fix mglist corruption that leads to memory corruption

From: Herbert Xu <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 6b0d6a9b4296fa16a28d10d416db7a770fc03287 upstream.

The list mp->mglist is used to indicate whether a multicast group
is active on the bridge interface itself as opposed to one of the
constituent interfaces in the bridge.

Unfortunately the operation that adds the mp->mglist node to the
list neglected to check whether it has already been added. This
leads to list corruption in the form of nodes pointing to itself.

Normally this would be quite obvious as it would cause an infinite
loop when walking the list. However, as this list is never actually
walked (which means that we don't really need it, I'll get rid of
it in a subsequent patch), this instead is hidden until we perform
a delete operation on the affected nodes.

As the same node may now be pointed to by more than one node, the
delete operations can then cause modification of freed memory.

This was observed in practice to cause corruption in 512-byte slabs,
most commonly leading to crashes in jbd2.

Thanks to Josef Bacik for pointing me in the right direction.

Reported-by: Ian Page Hands <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
net/bridge/br_multicast.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index eaa0e1bae49b..ea4452f3dacb 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -532,7 +532,8 @@ static int br_multicast_add_group(struct net_bridge *br,
goto err;

if (!port) {
- hlist_add_head(&mp->mglist, &br->mglist);
+ if (hlist_unhashed(&mp->mglist))
+ hlist_add_head(&mp->mglist, &br->mglist);
mod_timer(&mp->timer, now + br->multicast_membership_interval);
goto out;
}
--
1.8.5.2

2014-02-05 21:17:29

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 004/213] crypto: ghash - Avoid null pointer dereference if no key is set

From: Nick Bowler <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 7ed47b7d142ec99ad6880bbbec51e9f12b3af74c upstream.

The ghash_update function passes a pointer to gf128mul_4k_lle which will
be NULL if ghash_setkey is not called or if the most recent call to
ghash_setkey failed to allocate memory. This causes an oops. Fix this
up by returning an error code in the null case.

This is trivially triggered from unprivileged userspace through the
AF_ALG interface by simply writing to the socket without setting a key.

The ghash_final function has a similar issue, but triggering it requires
a memory allocation failure in ghash_setkey _after_ at least one
successful call to ghash_update.

BUG: unable to handle kernel NULL pointer dereference at 00000670
IP: [<d88c92d4>] gf128mul_4k_lle+0x23/0x60 [gf128mul]
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ghash_generic gf128mul algif_hash af_alg nfs lockd nfs_acl sunrpc bridge ipv6 stp llc

Pid: 1502, comm: hashatron Tainted: G W 3.1.0-rc9-00085-ge9308cf #32 Bochs Bochs
EIP: 0060:[<d88c92d4>] EFLAGS: 00000202 CPU: 0
EIP is at gf128mul_4k_lle+0x23/0x60 [gf128mul]
EAX: d69db1f0 EBX: d6b8ddac ECX: 00000004 EDX: 00000000
ESI: 00000670 EDI: d6b8ddac EBP: d6b8ddc8 ESP: d6b8dda4
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process hashatron (pid: 1502, ti=d6b8c000 task=d6810000 task.ti=d6b8c000)
Stack:
00000000 d69db1f0 00000163 00000000 d6b8ddc8 c101a520 d69db1f0 d52aa000
00000ff0 d6b8dde8 d88d310f d6b8a3f8 d52aa000 00001000 d88d502c d6b8ddfc
00001000 d6b8ddf4 c11676ed d69db1e8 d6b8de24 c11679ad d52aa000 00000000
Call Trace:
[<c101a520>] ? kmap_atomic_prot+0x37/0xa6
[<d88d310f>] ghash_update+0x85/0xbe [ghash_generic]
[<c11676ed>] crypto_shash_update+0x18/0x1b
[<c11679ad>] shash_ahash_update+0x22/0x36
[<c11679cc>] shash_async_update+0xb/0xd
[<d88ce0ba>] hash_sendpage+0xba/0xf2 [algif_hash]
[<c121b24c>] kernel_sendpage+0x39/0x4e
[<d88ce000>] ? 0xd88cdfff
[<c121b298>] sock_sendpage+0x37/0x3e
[<c121b261>] ? kernel_sendpage+0x4e/0x4e
[<c10b4dbc>] pipe_to_sendpage+0x56/0x61
[<c10b4e1f>] splice_from_pipe_feed+0x58/0xcd
[<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10
[<c10b51f5>] __splice_from_pipe+0x36/0x55
[<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10
[<c10b6383>] splice_from_pipe+0x51/0x64
[<c10b63c2>] ? default_file_splice_write+0x2c/0x2c
[<c10b63d5>] generic_splice_sendpage+0x13/0x15
[<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10
[<c10b527f>] do_splice_from+0x5d/0x67
[<c10b6865>] sys_splice+0x2bf/0x363
[<c129373b>] ? sysenter_exit+0xf/0x16
[<c104dc1e>] ? trace_hardirqs_on_caller+0x10e/0x13f
[<c129370c>] sysenter_do_call+0x12/0x32
Code: 83 c4 0c 5b 5e 5f c9 c3 55 b9 04 00 00 00 89 e5 57 8d 7d e4 56 53 8d 5d e4 83 ec 18 89 45 e0 89 55 dc 0f b6 70 0f c1 e6 04 01 d6 <f3> a5 be 0f 00 00 00 4e 89 d8 e8 48 ff ff ff 8b 45 e0 89 da 0f
EIP: [<d88c92d4>] gf128mul_4k_lle+0x23/0x60 [gf128mul] SS:ESP 0068:d6b8dda4
CR2: 0000000000000670
---[ end trace 4eaa2a86a8e2da24 ]---
note: hashatron[1502] exited with preempt_count 1
BUG: scheduling while atomic: hashatron/1502/0x10000002
INFO: lockdep is turned off.
[...]

Signed-off-by: Nick Bowler <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
crypto/ghash-generic.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/crypto/ghash-generic.c b/crypto/ghash-generic.c
index be4425616931..7835b8fc94db 100644
--- a/crypto/ghash-generic.c
+++ b/crypto/ghash-generic.c
@@ -67,6 +67,9 @@ static int ghash_update(struct shash_desc *desc,
struct ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
u8 *dst = dctx->buffer;

+ if (!ctx->gf128)
+ return -ENOKEY;
+
if (dctx->bytes) {
int n = min(srclen, dctx->bytes);
u8 *pos = dst + (GHASH_BLOCK_SIZE - dctx->bytes);
@@ -119,6 +122,9 @@ static int ghash_final(struct shash_desc *desc, u8 *dst)
struct ghash_ctx *ctx = crypto_shash_ctx(desc->tfm);
u8 *buf = dctx->buffer;

+ if (!ctx->gf128)
+ return -ENOKEY;
+
ghash_flush(ctx, dctx);
memcpy(dst, buf, GHASH_BLOCK_SIZE);

--
1.8.5.2

2014-02-05 21:17:48

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 008/213] KVM: lock slots_lock around device assignment

From: Alex Williamson <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 21a1416a1c945c5aeaeaf791b63c64926018eb77 upstream.

As pointed out by Jason Baron, when assigning a device to a guest
we first set the iommu domain pointer, which enables mapping
and unmapping of memory slots to the iommu. This leaves a window
where this path is enabled, but we haven't synchronized the iommu
mappings to the existing memory slots. Thus a slot being removed
at that point could send us down unexpected code paths removing
non-existent pinnings and iommu mappings. Take the slots_lock
around creating the iommu domain and initial mappings as well as
around iommu teardown to avoid this race.

Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
[PG: drop goto for EPERM check, 2.6.34 doesn't have that code]
Signed-off-by: Paul Gortmaker <[email protected]>
---
virt/kvm/iommu.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index ac765f648218..8c510edca9c9 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -174,18 +174,20 @@ int kvm_iommu_map_guest(struct kvm *kvm)
return -ENODEV;
}

+ mutex_lock(&kvm->slots_lock);
+
kvm->arch.iommu_domain = iommu_domain_alloc();
- if (!kvm->arch.iommu_domain)
- return -ENOMEM;
+ if (!kvm->arch.iommu_domain) {
+ r = -ENOMEM;
+ goto out_unlock;
+ }

r = kvm_iommu_map_memslots(kvm);
if (r)
- goto out_unmap;
-
- return 0;
+ kvm_iommu_unmap_memslots(kvm);

-out_unmap:
- kvm_iommu_unmap_memslots(kvm);
+out_unlock:
+ mutex_unlock(&kvm->slots_lock);
return r;
}

@@ -239,7 +241,11 @@ int kvm_iommu_unmap_guest(struct kvm *kvm)
if (!domain)
return 0;

+ mutex_lock(&kvm->slots_lock);
kvm_iommu_unmap_memslots(kvm);
+ kvm->arch.iommu_domain = NULL;
+ mutex_unlock(&kvm->slots_lock);
+
iommu_domain_free(domain);
return 0;
}
--
1.8.5.2

2014-02-05 20:03:25

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 001/213] x86, random: make ARCH_RANDOM prompt if EMBEDDED, not EXPERT

From: Romain Francoise <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

Before v2.6.38 CONFIG_EXPERT was known as CONFIG_EMBEDDED but the
Kconfig entry was not changed to match when upstream commit
628c6246d47b85f5357298601df2444d7f4dd3fd ("x86, random: Architectural
inlines to get random integers with RDRAND") was backported.

Signed-off-by: Romain Francoise <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
arch/x86/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0a813f82eb9b..b87027b463a3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1462,7 +1462,7 @@ config ARCH_USES_PG_UNCACHED

config ARCH_RANDOM
def_bool y
- prompt "x86 architectural random number generator" if EXPERT
+ prompt "x86 architectural random number generator" if EMBEDDED
---help---
Enable the x86 architectural RDRAND instruction
(Intel Bull Mountain technology) to generate random numbers.
--
1.8.5.2

2014-02-05 21:18:13

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 006/213] inotify: fix double free/corruption of stuct user

From: Eric Paris <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit d0de4dc584ec6aa3b26fffea320a8457827768fc upstream.

On an error path in inotify_init1 a normal user can trigger a double
free of struct user. This is a regression introduced by a2ae4cc9a16e
("inotify: stop kernel memory leak on file creation failure").

We fix this by making sure that if a group exists the user reference is
dropped when the group is cleaned up. We should not explictly drop the
reference on error and also drop the reference when the group is cleaned
up.

The new lifetime rules are that an inotify group lives from
inotify_new_group to the last fsnotify_put_group. Since the struct user
and inotify_devs are directly tied to this lifetime they are only
changed/updated in those two locations. We get rid of all special
casing of struct user or user->inotify_devs.

Signed-off-by: Eric Paris <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/notify/inotify/inotify_fsnotify.c | 1 +
fs/notify/inotify/inotify_user.c | 39 ++++++++++++------------------------
2 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 5d3d2a782abc..29da76656b9c 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -150,6 +150,7 @@ static void inotify_free_group_priv(struct fsnotify_group *group)
idr_for_each(&group->inotify_data.idr, idr_callback, group);
idr_remove_all(&group->inotify_data.idr);
idr_destroy(&group->inotify_data.idr);
+ atomic_dec(&group->inotify_data.user->inotify_devs);
free_uid(group->inotify_data.user);
}

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 72f882552608..7b1a2c3e36c6 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -290,15 +290,12 @@ static int inotify_fasync(int fd, struct file *file, int on)
static int inotify_release(struct inode *ignored, struct file *file)
{
struct fsnotify_group *group = file->private_data;
- struct user_struct *user = group->inotify_data.user;

fsnotify_clear_marks_by_group(group);

/* free this group, matching get was inotify_init->fsnotify_obtain_group */
fsnotify_put_group(group);

- atomic_dec(&user->inotify_devs);
-
return 0;
}

@@ -616,7 +613,7 @@ retry:
return ret;
}

-static struct fsnotify_group *inotify_new_group(struct user_struct *user, unsigned int max_events)
+static struct fsnotify_group *inotify_new_group(unsigned int max_events)
{
struct fsnotify_group *group;
unsigned int grp_num;
@@ -632,8 +629,14 @@ static struct fsnotify_group *inotify_new_group(struct user_struct *user, unsign
spin_lock_init(&group->inotify_data.idr_lock);
idr_init(&group->inotify_data.idr);
group->inotify_data.last_wd = 0;
- group->inotify_data.user = user;
group->inotify_data.fa = NULL;
+ group->inotify_data.user = get_current_user();
+
+ if (atomic_inc_return(&group->inotify_data.user->inotify_devs) >
+ inotify_max_user_instances) {
+ fsnotify_put_group(group);
+ return ERR_PTR(-EMFILE);
+ }

return group;
}
@@ -643,7 +646,6 @@ static struct fsnotify_group *inotify_new_group(struct user_struct *user, unsign
SYSCALL_DEFINE1(inotify_init1, int, flags)
{
struct fsnotify_group *group;
- struct user_struct *user;
int ret;

/* Check the IN_* constants for consistency. */
@@ -653,31 +655,16 @@ SYSCALL_DEFINE1(inotify_init1, int, flags)
if (flags & ~(IN_CLOEXEC | IN_NONBLOCK))
return -EINVAL;

- user = get_current_user();
- if (unlikely(atomic_read(&user->inotify_devs) >=
- inotify_max_user_instances)) {
- ret = -EMFILE;
- goto out_free_uid;
- }
-
/* fsnotify_obtain_group took a reference to group, we put this when we kill the file in the end */
- group = inotify_new_group(user, inotify_max_queued_events);
- if (IS_ERR(group)) {
- ret = PTR_ERR(group);
- goto out_free_uid;
- }
-
- atomic_inc(&user->inotify_devs);
+ group = inotify_new_group(inotify_max_queued_events);
+ if (IS_ERR(group))
+ return PTR_ERR(group);

ret = anon_inode_getfd("inotify", &inotify_fops, group,
O_RDONLY | flags);
- if (ret >= 0)
- return ret;
+ if (ret < 0)
+ fsnotify_put_group(group);

- fsnotify_put_group(group);
- atomic_dec(&user->inotify_devs);
-out_free_uid:
- free_uid(user);
return ret;
}

--
1.8.5.2

2014-02-05 21:18:10

by Paul Gortmaker

[permalink] [raw]
Subject: [v2.6.34-stable 007/213] KVM: unmap pages from the iommu when slots are removed

From: Alex Williamson <[email protected]>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
If you see a problem with using this for longterm, please comment.
-------------------

commit 32f6daad4651a748a58a3ab6da0611862175722f upstream.

We've been adding new mappings, but not destroying old mappings.
This can lead to a page leak as pages are pinned using
get_user_pages, but only unpinned with put_page if they still
exist in the memslots list on vm shutdown. A memslot that is
destroyed while an iommu domain is enabled for the guest will
therefore result in an elevated page reference count that is
never cleared.

Additionally, without this fix, the iommu is only programmed
with the first translation for a gpa. This can result in
peer-to-peer errors if a mapping is destroyed and replaced by a
new mapping at the same gpa as the iommu will still be pointing
to the original, pinned memory address.

Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
[PG: minor tweak since 2.6.34 doesnt have kvm_for_each_memslot]
Signed-off-by: Paul Gortmaker <[email protected]>
---
include/linux/kvm_host.h | 6 ++++++
virt/kvm/iommu.c | 8 ++++++--
virt/kvm/kvm_main.c | 5 +++--
3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 94cb72cfc2c3..5072583996f9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -454,6 +454,7 @@ void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);

#ifdef CONFIG_IOMMU_API
int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_iommu_unmap_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
int kvm_iommu_map_guest(struct kvm *kvm);
int kvm_iommu_unmap_guest(struct kvm *kvm);
int kvm_assign_device(struct kvm *kvm,
@@ -468,6 +469,11 @@ static inline int kvm_iommu_map_pages(struct kvm *kvm,
return 0;
}

+static inline void kvm_iommu_unmap_pages(struct kvm *kvm,
+ struct kvm_memory_slot *slot)
+{
+}
+
static inline int kvm_iommu_map_guest(struct kvm *kvm)
{
return -ENODEV;
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 80fd3ad3b2de..ac765f648218 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -212,6 +212,11 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
iommu_unmap_range(domain, gfn_to_gpa(base_gfn), PAGE_SIZE * npages);
}

+void kvm_iommu_unmap_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+ kvm_iommu_put_pages(kvm, slot->base_gfn, slot->npages);
+}
+
static int kvm_iommu_unmap_memslots(struct kvm *kvm)
{
int i;
@@ -220,8 +225,7 @@ static int kvm_iommu_unmap_memslots(struct kvm *kvm)
slots = rcu_dereference(kvm->memslots);

for (i = 0; i < slots->nmemslots; i++) {
- kvm_iommu_put_pages(kvm, slots->memslots[i].base_gfn,
- slots->memslots[i].npages);
+ kvm_iommu_unmap_pages(kvm, &slots->memslots[i]);
}

return 0;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b624139aea6e..3d2974fab62e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -697,12 +697,13 @@ skip_lpage:
goto out_free;

#ifdef CONFIG_DMAR
- /* map the pages in iommu page table */
+ /* map/unmap the pages in iommu page table */
if (npages) {
r = kvm_iommu_map_pages(kvm, &new);
if (r)
goto out_free;
- }
+ } else
+ kvm_iommu_unmap_pages(kvm, &old);
#endif

r = -ENOMEM;
--
1.8.5.2