2015-11-24 22:47:19

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 00/52] 3.2.74-rc1 review

This is the start of the stable review cycle for the 3.2.74 release.
There are 52 patches in this series, which will be posted as responses
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri Nov 27 00:00:00 UTC 2015.
Anything received after that time might be too late.

A combined patch relative to 3.2.73 will be posted as an additional
response to this. A shortlog and diffstat can be found below.

Ben.

-------------

Alex Williamson (2):
PCI: Fix devfn for VPD access through function 0
[9d9240756e63dd87d6cbf5da8b98ceb8f8192b55]
PCI: Use function 0 VPD for identical functions, regular VPD for others
[da2d03ea27f6ed9d2005a67b20dd021ddacf1e4d]

Ani Sinha (1):
ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context.
[44f49dd8b5a606870a1f21101522a0f9c4414784]

Arnd Bergmann (1):
ARM: pxa: remove incorrect __init annotation on pxa27x_set_pwrmode
[54c09889bff6d99c8733eed4a26c9391b177c88b]

Bart Van Assche (1):
scsi: Fix a bdi reregistration race
[bf2cf3baa20b0a6cd2d08707ef05dc0e992a8aa0]

Boris BREZILLON (1):
mtd: mtdpart: fix add_mtd_partitions error path
[e5bae86797141e4a95e42d825f737cb36d7b8c37]

Borislav Petkov (1):
x86/cpu: Call verify_cpu() after having entered long mode too
[04633df0c43d710e5f696b06539c100898678235]

Brian Norris (1):
mtd: blkdevs: fix potential deadlock + lockdep warnings
[f3c63795e90f0c6238306883b6c72f14d5355721]

Chen Yu (1):
ACPI: Use correct IRQ when uninstalling ACPI interrupt handler
[49e4b84333f338d4f183f28f1f3c1131b9fb2b5a]

Chris Mason (1):
Btrfs: don't use ram_bytes for uncompressed inline items
[514ac8ad8793a097c0c9d89202c642479d6dfa34]

Christoph Hellwig (1):
scsi: restart list search after unlock in scsi_remove_target
[40998193560dab6c3ce8d25f4fa58a23e252ef38]

Christophe Leroy (1):
splice: sendfile() at once fails for big files
[0ff28d9f4674d781e492bcff6f32f0fe48cf0fed]

Daeho Jeong (1):
ext4, jbd2: ensure entering into panic after recording an error in superblock
[4327ba52afd03fc4b5afa0ee1d774c9c5b0e85c5]

Dan Carpenter (3):
devres: fix a for loop bounds check
[1f35d04a02a652f14566f875aef3a6f2af4cb77b]
irda: precedence bug in irlmp_seq_hb_idx()
[50010c20597d14667eff0fdb628309986f195230]
mwifiex: fix mwifiex_rdeeprom_read()
[1f9c6e1bc1ba5f8a10fcd6e99d170954d7c6d382]

David Howells (1):
FS-Cache: Handle a write to the page immediately beyond the EOF marker
[102f4d900c9c8f5ed89ae4746d493fe3ebd7ba64]

David Woodhouse (1):
iommu/vt-d: Fix ATSR handling for Root-Complex integrated endpoints
[d14053b3c714178525f22660e6aaf41263d00056]

Dmitry Tunin (2):
Bluetooth: ath3k: Add new AR3012 0930:021c id
[cd355ff071cd37e7197eccf9216770b2b29369f7]
Bluetooth: ath3k: Add support of AR3012 0cf3:817b device
[18e0afab8ce3f1230ce3fef52b2e73374fd9c0e7]

Eric Dumazet (3):
net: avoid NULL deref in inet_ctl_sock_destroy()
[8fa677d2706d325d71dab91bf6e6512c05214e37]
net: fix a race in dst_release()
[d69bbf88c8d0b367cf3e3a052f6daadf630ee566]
packet: fix match_fanout_group()
[161642e24fee40fba2c5bc2ceacc00d118a22d65]

Filipe Manana (5):
Btrfs: fix file corruption and data loss after cloning inline extents
[8039d87d9e473aeb740d4fdbd59b9d2f89b2ced9]
Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow
[1d512cb77bdbda80f0dd0620a3b260d697fd581d]
Btrfs: fix race leading to incorrect item deletion when dropping extents
[aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c]
Btrfs: fix race when listing an inode's xattrs
[f1cd1f0b7d1b5d4aaa5711e8f4e4898b0045cb6d]
Btrfs: fix truncation of compressed and inlined extents
[0305cd5f7fca85dae392b9ba85b116896eb7c1c7]

Herbert Xu (1):
crypto: algif_hash - Only export and import on sockets with data
[4afa5f9617927453ac04b24b584f6c718dfb4f45]

Jan Schmidt (1):
Btrfs: added helper btrfs_next_item()
[c7d22a3c3cdb73d8a0151e2ccc8cf4a48c48310b]

Jann Horn (1):
fs: if a coredump already exists, unlink and recreate with O_EXCL
[fbb1816942c04429e85dbf4c1a080accc534299e]

Johannes Berg (1):
mac80211: fix driver RSSI event calculations
[8ec6d97871f37e4743678ea4a455bd59580aa0f4]

Kees Cook (1):
fs: make dumpable=2 require fully qualified path
[9520628e8ceb69fa9a4aee6b57f22675d9e1b709]

Kinglong Mee (2):
FS-Cache: Don't override netfs's primary_index if registering failed
[b130ed5998e62879a66bad08931a2b5e832da95c]
FS-Cache: Increase reference of parent after registering, netfs success
[86108c2e34a26e4bec3c6ddb23390bf8cedcf391]

Larry Finger (1):
staging: rtl8712: Add device ID for Sitecom WLA2100
[1e6e63283691a2a9048a35d9c6c59cf0abd342e4]

Libin (1):
recordmcount: Fix endianness handling bug for nop_mcount
[c84da8b9ad3761eef43811181c7e896e9834b26b]

Maciej W. Rozycki (1):
binfmt_elf: Don't clobber passed executable's file header
[b582ef5c53040c5feef4c96a8f9585b6831e2441]

Marek Vasut (1):
can: Use correct type in sizeof() in nla_put()
[562b103a21974c2f9cd67514d110f918bb3e1796]

Michal Kubeček (1):
ipv6: fix tunnel error handling
[ebac62fe3d24c0ce22dd83afa7b07d1a2aaef44d]

Paolo Bonzini (1):
KVM: svm: unconditionally intercept #DB
[cbdb967af3d54993f5814f1cee0ed311a055377d]

Peter Oberparleiter (1):
scsi_sysfs: Fix queue_ramp_up_period return code
[863e02d0e173bb9d8cea6861be22820b25c076cc]

Peter Zijlstra (1):
perf: Fix inherited events vs. tracepoint filters
[b71b437eedaed985062492565d9d421d975ae845]

Ralf Baechle (1):
MIPS: atomic: Fix comment describing atomic64_add_unless's return value.
[f0a232cde7be18a207fd057dd79bbac8a0a45dec]

Richard Purdie (1):
HID: core: Avoid uninitialized buffer access
[79b568b9d0c7c5d81932f4486d50b38efdd6da6d]

Sowmini Varadhan (1):
RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv
[8ce675ff39b9958d1c10f86cf58e357efaafc856]

Stefan Richter (1):
firewire: ohci: fix JMicron JMB38x IT context discovery
[100ceb66d5c40cc0c7018e06a9474302470be73c]

[email protected] (2):
megaraid_sas : SMAP restriction--do not access user memory from IOCTL code
[323c4a02c631d00851d8edc4213c4d184ef83647]
megaraid_sas: Do not use PAGE_SIZE for max_sectors
[357ae967ad66e357f78b5cfb5ab6ca07fb4a7758]

Takashi Iwai (2):
ALSA: hda - Apply pin fixup for HP ProBook 6550b
[c932b98c1e47312822d911c1bb76e81ef50e389c]
ALSA: hda - Disable 64bit address for Creative HDA controllers
[cadd16ea33a938d49aee99edd4758cc76048b399]

Valentin Rothberg (1):
wm831x_power: Use IRQF_ONESHOT to request threaded IRQs
[90adf98d9530054b8e665ba5a928de4307231d84]

Documentation/sysctl/fs.txt | 18 ++-
Makefile | 4 +-
arch/arm/mach-pxa/include/mach/pxa27x.h | 2 +-
arch/arm/mach-pxa/pxa27x.c | 2 +-
arch/mips/include/asm/atomic.h | 2 +-
arch/x86/kernel/head_64.S | 8 ++
arch/x86/kernel/verify_cpu.S | 12 +-
arch/x86/kvm/svm.c | 25 +---
crypto/algif_hash.c | 12 +-
drivers/acpi/osl.c | 9 +-
drivers/bluetooth/ath3k.c | 4 +
drivers/bluetooth/btusb.c | 2 +
drivers/firewire/ohci.c | 5 +
drivers/hid/hid-core.c | 2 +-
drivers/iommu/intel-iommu.c | 7 +-
drivers/mtd/mtd_blkdevs.c | 10 +-
drivers/mtd/mtdpart.c | 4 +-
drivers/net/can/dev.c | 2 +-
drivers/net/wireless/mwifiex/debugfs.c | 14 +--
drivers/pci/access.c | 27 +----
drivers/pci/quirks.c | 20 +++-
drivers/power/wm831x_power.c | 6 +-
drivers/scsi/megaraid/megaraid_sas.h | 2 +
drivers/scsi/megaraid/megaraid_sas_base.c | 15 ++-
drivers/scsi/scsi_sysfs.c | 32 ++---
drivers/staging/rtl8712/usb_intf.c | 1 +
fs/binfmt_elf.c | 10 +-
fs/btrfs/ctree.h | 39 +++++--
fs/btrfs/file.c | 18 ++-
fs/btrfs/inode.c | 91 ++++++++++++---
fs/btrfs/ioctl.c | 188 ++++++++++++++++++++++++------
fs/btrfs/print-tree.c | 2 +-
fs/btrfs/tree-log.c | 2 +-
fs/btrfs/xattr.c | 4 +-
fs/cachefiles/rdwr.c | 78 +++++++------
fs/exec.c | 49 +++++++-
fs/ext4/super.c | 12 +-
fs/fscache/netfs.c | 34 +++---
fs/fscache/page.c | 2 +-
fs/jbd2/journal.c | 6 +-
fs/splice.c | 12 +-
include/linux/acpi.h | 6 +
include/linux/jbd2.h | 1 +
include/net/inet_common.h | 3 +-
kernel/events/core.c | 4 +
lib/devres.c | 2 +-
net/core/dst.c | 2 +-
net/ipv4/ipmr.c | 4 +-
net/ipv6/tunnel6.c | 12 +-
net/irda/irlmp.c | 2 +-
net/mac80211/mlme.c | 2 +-
net/packet/af_packet.c | 6 +-
net/rds/tcp_recv.c | 11 +-
scripts/recordmcount.h | 2 +-
sound/pci/hda/hda_intel.c | 2 +
sound/pci/hda/patch_sigmatel.c | 1 +
56 files changed, 596 insertions(+), 258 deletions(-)

--
Ben Hutchings
Unix is many things to many people,
but it's never been everything to anybody.


2015-11-24 22:35:48

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 17/52] Bluetooth: ath3k: Add new AR3012 0930:021c id

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Dmitry Tunin <[email protected]>

commit cd355ff071cd37e7197eccf9216770b2b29369f7 upstream.

This adapter works with the existing linux-firmware.

T: Bus=01 Lev=01 Prnt=01 Port=03 Cnt=02 Dev#= 3 Spd=12 MxCh= 0
D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0930 ProdID=021c Rev=00.01
C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb

BugLink: https://bugs.launchpad.net/bugs/1502781

Signed-off-by: Dmitry Tunin <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/bluetooth/ath3k.c | 2 ++
drivers/bluetooth/btusb.c | 1 +
2 files changed, 3 insertions(+)

--- a/drivers/bluetooth/ath3k.c
+++ b/drivers/bluetooth/ath3k.c
@@ -90,6 +90,7 @@ static struct usb_device_id ath3k_table[
{ USB_DEVICE(0x04CA, 0x300f) },
{ USB_DEVICE(0x04CA, 0x3010) },
{ USB_DEVICE(0x0930, 0x0219) },
+ { USB_DEVICE(0x0930, 0x021c) },
{ USB_DEVICE(0x0930, 0x0220) },
{ USB_DEVICE(0x0930, 0x0227) },
{ USB_DEVICE(0x0b05, 0x17d0) },
@@ -148,6 +149,7 @@ static struct usb_device_id ath3k_blist_
{ USB_DEVICE(0x04ca, 0x300f), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x04ca, 0x3010), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0930, 0x0219), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0930, 0x021c), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0930, 0x0220), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0930, 0x0227), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0b05, 0x17d0), .driver_info = BTUSB_ATH3012 },
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -175,6 +175,7 @@ static struct usb_device_id blacklist_ta
{ USB_DEVICE(0x04ca, 0x300f), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x04ca, 0x3010), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0930, 0x0219), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0930, 0x021c), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0930, 0x0220), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0930, 0x0227), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0b05, 0x17d0), .driver_info = BTUSB_ATH3012 },

2015-11-24 22:35:45

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 13/52] ARM: pxa: remove incorrect __init annotation on pxa27x_set_pwrmode

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Arnd Bergmann <[email protected]>

commit 54c09889bff6d99c8733eed4a26c9391b177c88b upstream.

The z2 machine calls pxa27x_set_pwrmode() in order to power off
the machine, but this function gets discarded early at boot because
it is marked __init, as pointed out by kbuild:

WARNING: vmlinux.o(.text+0x145c4): Section mismatch in reference from the function z2_power_off() to the function .init.text:pxa27x_set_pwrmode()
The function z2_power_off() references
the function __init pxa27x_set_pwrmode().
This is often because z2_power_off lacks a __init
annotation or the annotation of pxa27x_set_pwrmode is wrong.

This removes the __init section modifier to fix rebooting and the
build error.

Signed-off-by: Arnd Bergmann <[email protected]>
Fixes: ba4a90a6d86a ("ARM: pxa/z2: fix building error of pxa27x_cpu_suspend() no longer available")
Signed-off-by: Robert Jarzmik <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/arm/mach-pxa/include/mach/pxa27x.h | 2 +-
arch/arm/mach-pxa/pxa27x.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm/mach-pxa/include/mach/pxa27x.h
+++ b/arch/arm/mach-pxa/include/mach/pxa27x.h
@@ -21,7 +21,7 @@

extern void __init pxa27x_map_io(void);
extern void __init pxa27x_init_irq(void);
-extern int __init pxa27x_set_pwrmode(unsigned int mode);
+extern int pxa27x_set_pwrmode(unsigned int mode);
extern void pxa27x_cpu_pm_enter(suspend_state_t state);

#define pxa27x_handle_irq ichp_handle_irq
--- a/arch/arm/mach-pxa/pxa27x.c
+++ b/arch/arm/mach-pxa/pxa27x.c
@@ -241,7 +241,7 @@ static struct clk_lookup pxa27x_clkregs[
*/
static unsigned int pwrmode = PWRMODE_SLEEP;

-int __init pxa27x_set_pwrmode(unsigned int mode)
+int pxa27x_set_pwrmode(unsigned int mode)
{
switch (mode) {
case PWRMODE_SLEEP:

2015-11-24 22:36:29

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 32/52] scsi: restart list search after unlock in scsi_remove_target

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <[email protected]>

commit 40998193560dab6c3ce8d25f4fa58a23e252ef38 upstream.

When dropping a lock while iterating a list we must restart the search
as other threads could have manipulated the list under us. Without this
we can get stuck in an endless loop. This bug was introduced by

commit bc3f02a795d3b4faa99d37390174be2a75d091bd
Author: Dan Williams <[email protected]>
Date: Tue Aug 28 22:12:10 2012 -0700

[SCSI] scsi_remove_target: fix softlockup regression on hot remove

Which was itself trying to fix a reported soft lockup issue

http://thread.gmane.org/gmane.linux.kernel/1348679

However, we believe even with this revert of the original patch, the soft
lockup problem has been fixed by

commit f2495e228fce9f9cec84367547813cbb0d6db15a
Author: James Bottomley <[email protected]>
Date: Tue Jan 21 07:01:41 2014 -0800

[SCSI] dual scan thread bug fix

Thanks go to Dan Williams <[email protected]> for tracking all this
prior history down.

Reported-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Fixes: bc3f02a795d3b4faa99d37390174be2a75d091bd
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/scsi/scsi_sysfs.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)

--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1029,31 +1029,23 @@ static void __scsi_remove_target(struct
void scsi_remove_target(struct device *dev)
{
struct Scsi_Host *shost = dev_to_shost(dev->parent);
- struct scsi_target *starget, *last = NULL;
+ struct scsi_target *starget;
unsigned long flags;

- /* remove targets being careful to lookup next entry before
- * deleting the last
- */
+restart:
spin_lock_irqsave(shost->host_lock, flags);
list_for_each_entry(starget, &shost->__targets, siblings) {
if (starget->state == STARGET_DEL)
continue;
if (starget->dev.parent == dev || &starget->dev == dev) {
- /* assuming new targets arrive at the end */
kref_get(&starget->reap_ref);
spin_unlock_irqrestore(shost->host_lock, flags);
- if (last)
- scsi_target_reap(last);
- last = starget;
__scsi_remove_target(starget);
- spin_lock_irqsave(shost->host_lock, flags);
+ scsi_target_reap(starget);
+ goto restart;
}
}
spin_unlock_irqrestore(shost->host_lock, flags);
-
- if (last)
- scsi_target_reap(last);
}
EXPORT_SYMBOL(scsi_remove_target);

2015-11-24 22:36:38

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 46/52] fs: make dumpable=2 require fully qualified path

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <[email protected]>

commit 9520628e8ceb69fa9a4aee6b57f22675d9e1b709 upstream.

When the suid_dumpable sysctl is set to "2", and there is no core dump
pipe defined in the core_pattern sysctl, a local user can cause core files
to be written to root-writable directories, potentially with
user-controlled content.

This means an admin can unknowningly reintroduce a variation of
CVE-2006-2451, allowing local users to gain root privileges.

$ cat /proc/sys/fs/suid_dumpable
2
$ cat /proc/sys/kernel/core_pattern
core
$ ulimit -c unlimited
$ cd /
$ ls -l core
ls: cannot access core: No such file or directory
$ touch core
touch: cannot touch `core': Permission denied
$ OHAI="evil-string-here" ping localhost >/dev/null 2>&1 &
$ pid=$!
$ sleep 1
$ kill -SEGV $pid
$ ls -l core
-rw------- 1 root kees 458752 Jun 21 11:35 core
$ sudo strings core | grep evil
OHAI=evil-string-here

While cron has been fixed to abort reading a file when there is any
parse error, there are still other sensitive directories that will read
any file present and skip unparsable lines.

Instead of introducing a suid_dumpable=3 mode and breaking all users of
mode 2, this only disables the unsafe portion of mode 2 (writing to disk
via relative path). Most users of mode 2 (e.g. Chrome OS) already use
a core dump pipe handler, so this change will not break them. For the
situations where a pipe handler is not defined but mode 2 is still
active, crash dumps will only be written to fully qualified paths. If a
relative path is defined (e.g. the default "core" pattern), dump
attempts will trigger a printk yelling about the lack of a fully
qualified path.

Signed-off-by: Kees Cook <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Serge Hallyn <[email protected]>
Cc: James Morris <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
Documentation/sysctl/fs.txt | 18 ++++++++++++------
fs/exec.c | 17 ++++++++++++++---
2 files changed, 26 insertions(+), 9 deletions(-)

--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -163,16 +163,22 @@ This value can be used to query and set
or otherwise protected/tainted binaries. The modes are

0 - (default) - traditional behaviour. Any process which has changed
- privilege levels or is execute only will not be dumped
+ privilege levels or is execute only will not be dumped.
1 - (debug) - all processes dump core when possible. The core dump is
owned by the current user and no security is applied. This is
intended for system debugging situations only. Ptrace is unchecked.
+ This is insecure as it allows regular users to examine the memory
+ contents of privileged processes.
2 - (suidsafe) - any binary which normally would not be dumped is dumped
- readable by root only. This allows the end user to remove
- such a dump but not access it directly. For security reasons
- core dumps in this mode will not overwrite one another or
- other files. This mode is appropriate when administrators are
- attempting to debug problems in a normal environment.
+ anyway, but only if the "core_pattern" kernel sysctl is set to
+ either a pipe handler or a fully qualified path. (For more details
+ on this limitation, see CVE-2006-2451.) This mode is appropriate
+ when administrators are attempting to debug problems in a normal
+ environment, and either have a core dump pipe handler that knows
+ to treat privileged core dumps with care, or specific directory
+ defined for catching core dumps. If a core dump happens without
+ a pipe handler or fully qualifid path, a message will be emitted
+ to syslog warning about the lack of a correct setting.

==============================================================

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -2136,6 +2136,7 @@ void do_coredump(long signr, int exit_co
int retval = 0;
int flag = 0;
int ispipe;
+ bool need_nonrelative = false;
static atomic_t core_dump_count = ATOMIC_INIT(0);
struct coredump_params cprm = {
.signr = signr,
@@ -2161,14 +2162,16 @@ void do_coredump(long signr, int exit_co
if (!cred)
goto fail;
/*
- * We cannot trust fsuid as being the "true" uid of the
- * process nor do we know its entire history. We only know it
- * was tainted so we dump it as root in mode 2.
+ * We cannot trust fsuid as being the "true" uid of the process
+ * nor do we know its entire history. We only know it was tainted
+ * so we dump it as root in mode 2, and only into a controlled
+ * environment (pipe handler or fully qualified path).
*/
if (__get_dumpable(cprm.mm_flags) == 2) {
/* Setuid core dump mode */
flag = O_EXCL; /* Stop rewrite attacks */
cred->fsuid = 0; /* Dump root private */
+ need_nonrelative = true;
}

retval = coredump_wait(exit_code, &core_state);
@@ -2248,6 +2251,14 @@ void do_coredump(long signr, int exit_co
if (cprm.limit < binfmt->min_coredump)
goto fail_unlock;

+ if (need_nonrelative && cn.corename[0] != '/') {
+ printk(KERN_WARNING "Pid %d(%s) can only dump core "\
+ "to fully qualified path!\n",
+ task_tgid_vnr(current), current->comm);
+ printk(KERN_WARNING "Skipping core dump\n");
+ goto fail_unlock;
+ }
+
cprm.file = filp_open(cn.corename,
O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
0600);

2015-11-24 22:36:23

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 48/52] irda: precedence bug in irlmp_seq_hb_idx()

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

[ Upstream commit 50010c20597d14667eff0fdb628309986f195230 ]

This is decrementing the pointer, instead of the value stored in the
pointer. KASan detects it as an out of bounds reference.

Reported-by: "Berry Cheng 程君(成淼)" <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/irda/irlmp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/irda/irlmp.c
+++ b/net/irda/irlmp.c
@@ -1868,7 +1868,7 @@ static void *irlmp_seq_hb_idx(struct irl
for (element = hashbin_get_first(iter->hashbin);
element != NULL;
element = hashbin_get_next(iter->hashbin)) {
- if (!off || *off-- == 0) {
+ if (!off || (*off)-- == 0) {
/* NB: hashbin left locked */
return element;
}

2015-11-24 22:36:43

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 34/52] Btrfs: fix race leading to incorrect item deletion when dropping extents

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Filipe Manana <[email protected]>

commit aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c upstream.

While running a stress test I got the following warning triggered:

[191627.672810] ------------[ cut here ]------------
[191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
(...)
[191627.701485] Call Trace:
[191627.702037] [<ffffffff8145f077>] dump_stack+0x4f/0x7b
[191627.702992] [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
[191627.704091] [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
[191627.705380] [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
[191627.706637] [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
[191627.707789] [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
[191627.709155] [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
[191627.712444] [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
[191627.714162] [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
[191627.715887] [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
[191627.717287] [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
[191627.728865] [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
[191627.730045] [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
[191627.731256] [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
[191627.732661] [<ffffffff81061119>] process_one_work+0x24c/0x4ae
[191627.733822] [<ffffffff810615b0>] worker_thread+0x206/0x2c2
[191627.734857] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
[191627.736052] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
[191627.737349] [<ffffffff810669a6>] kthread+0xef/0xf7
[191627.738267] [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
[191627.739330] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
[191627.741976] [<ffffffff81465592>] ret_from_fork+0x42/0x70
[191627.743080] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
[191627.744206] ---[ end trace bbfddacb7aaada8d ]---

$ cat -n fs/btrfs/file.c
691 int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
(...)
758 btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
759 if (key.objectid > ino ||
760 key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
761 break;
762
763 fi = btrfs_item_ptr(leaf, path->slots[0],
764 struct btrfs_file_extent_item);
765 extent_type = btrfs_file_extent_type(leaf, fi);
766
767 if (extent_type == BTRFS_FILE_EXTENT_REG ||
768 extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
(...)
774 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
(...)
778 } else {
779 WARN_ON(1);
780 extent_end = search_start;
781 }
(...)

This happened because the item we were processing did not match a file
extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
case we cast the item to a struct btrfs_file_extent_item pointer and
then find a type field value that does not match any of the expected
values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
due to a tiny time window where a race can happen as exemplified below.
For example, consider the following scenario where we're using the
NO_HOLES feature and we have the following two neighbour leafs:

Leaf X (has N items) Leaf Y

[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 EXTENT_DATA 8192), ... ]
slot N - 2 slot N - 1 slot 0

Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
than explicit because NO_HOLES is enabled). Now if our inode has an
ordered extent for the range [4K, 8K[ that is finishing, the following
can happen:

CPU 1 CPU 2

btrfs_finish_ordered_io()
insert_reserved_file_extent()
__btrfs_drop_extents()
Searches for the key
(257 EXTENT_DATA 4096) through
btrfs_lookup_file_extent()

Key not found and we get a path where
path->nodes[0] == leaf X and
path->slots[0] == N

Because path->slots[0] is >=
btrfs_header_nritems(leaf X), we call
btrfs_next_leaf()

btrfs_next_leaf() releases the path

inserts key
(257 INODE_REF 4096)
at the end of leaf X,
leaf X now has N + 1 keys,
and the new key is at
slot N

btrfs_next_leaf() searches for
key (257 INODE_REF 256), with
path->keep_locks set to 1,
because it was the last key it
saw in leaf X

finds it in leaf X again and
notices it's no longer the last
key of the leaf, so it returns 0
with path->nodes[0] == leaf X and
path->slots[0] == N (which is now
< btrfs_header_nritems(leaf X)),
pointing to the new key
(257 INODE_REF 4096)

__btrfs_drop_extents() casts the
item at path->nodes[0], slot
path->slots[0], to a struct
btrfs_file_extent_item - it does
not skip keys for the target
inode with a type less than
BTRFS_EXTENT_DATA_KEY
(BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)

sees a bogus value for the type
field triggering the WARN_ON in
the trace shown above, and sets
extent_end = search_start (4096)

does the if-then-else logic to
fixup 0 length extent items created
by a past bug from hole punching:

if (extent_end == key.offset &&
extent_end >= search_start)
goto delete_extent_item;

that evaluates to true and it ends
up deleting the key pointed to by
path->slots[0], (257 INODE_REF 4096),
from leaf X

The same could happen for example for a xattr that ends up having a key
with an offset value that matches search_start (very unlikely but not
impossible).

So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
skipped, never casted to struct btrfs_file_extent_item and never deleted
by accident. Also protect against the unexpected case of getting a key
for a lower inode number by skipping that key and issuing a warning.

Signed-off-by: Filipe Manana <[email protected]>
[bwh: Backported to 3.2: drop use of ASSERT()]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/btrfs/file.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -605,8 +605,15 @@ next_slot:
}

btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
- if (key.objectid > ino ||
- key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
+
+ if (key.objectid > ino)
+ break;
+ if (WARN_ON_ONCE(key.objectid < ino) ||
+ key.type < BTRFS_EXTENT_DATA_KEY) {
+ path->slots[0]++;
+ goto next_slot;
+ }
+ if (key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
break;

fi = btrfs_item_ptr(leaf, path->slots[0],
@@ -625,8 +632,8 @@ next_slot:
btrfs_file_extent_inline_len(leaf,
path->slots[0], fi);
} else {
- WARN_ON(1);
- extent_end = search_start;
+ /* can't happen */
+ BUG();
}

if (extent_end <= search_start) {

2015-11-24 22:36:49

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 37/52] scsi_sysfs: Fix queue_ramp_up_period return code

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Peter Oberparleiter <[email protected]>

commit 863e02d0e173bb9d8cea6861be22820b25c076cc upstream.

Writing a number to /sys/bus/scsi/devices/<sdev>/queue_ramp_up_period
returns the value of that number instead of the number of bytes written.
This behavior can confuse programs expecting POSIX write() semantics.
Fix this by returning the number of bytes written instead.

Signed-off-by: Peter Oberparleiter <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Matthew R. Ochs <[email protected]>
Reviewed-by: Ewan D. Milne <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/scsi/scsi_sysfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -788,7 +788,7 @@ sdev_store_queue_ramp_up_period(struct d
return -EINVAL;

sdev->queue_ramp_up_period = msecs_to_jiffies(period);
- return period;
+ return count;
}

static struct device_attribute sdev_attr_queue_ramp_up_period =

2015-11-24 22:36:54

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 47/52] fs: if a coredump already exists, unlink and recreate with O_EXCL

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Jann Horn <[email protected]>

commit fbb1816942c04429e85dbf4c1a080accc534299e upstream.

It was possible for an attacking user to trick root (or another user) into
writing his coredumps into an attacker-readable, pre-existing file using
rename() or link(), causing the disclosure of secret data from the victim
process' virtual memory. Depending on the configuration, it was also
possible to trick root into overwriting system files with coredumps. Fix
that issue by never writing coredumps into existing files.

Requirements for the attack:
- The attack only applies if the victim's process has a nonzero
RLIMIT_CORE and is dumpable.
- The attacker can trick the victim into coredumping into an
attacker-writable directory D, either because the core_pattern is
relative and the victim's cwd is attacker-writable or because an
absolute core_pattern pointing to a world-writable directory is used.
- The attacker has one of these:
A: on a system with protected_hardlinks=0:
execute access to a folder containing a victim-owned,
attacker-readable file on the same partition as D, and the
victim-owned file will be deleted before the main part of the attack
takes place. (In practice, there are lots of files that fulfill
this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
This does not apply to most Linux systems because most distros set
protected_hardlinks=1.
B: on a system with protected_hardlinks=1:
execute access to a folder containing a victim-owned,
attacker-readable and attacker-writable file on the same partition
as D, and the victim-owned file will be deleted before the main part
of the attack takes place.
(This seems to be uncommon.)
C: on any system, independent of protected_hardlinks:
write access to a non-sticky folder containing a victim-owned,
attacker-readable file on the same partition as D
(This seems to be uncommon.)

The basic idea is that the attacker moves the victim-owned file to where
he expects the victim process to dump its core. The victim process dumps
its core into the existing file, and the attacker reads the coredump from
it.

If the attacker can't move the file because he does not have write access
to the containing directory, he can instead link the file to a directory
he controls, then wait for the original link to the file to be deleted
(because the kernel checks that the link count of the corefile is 1).

A less reliable variant that requires D to be non-sticky works with link()
and does not require deletion of the original link: link() the file into
D, but then unlink() it directly before the kernel performs the link count
check.

On systems with protected_hardlinks=0, this variant allows an attacker to
not only gain information from coredumps, but also clobber existing,
victim-writable files with coredumps. (This could theoretically lead to a
privilege escalation.)

Signed-off-by: Jann Horn <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/exec.c | 38 ++++++++++++++++++++++++++++++++------
1 file changed, 32 insertions(+), 6 deletions(-)

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -2134,9 +2134,9 @@ void do_coredump(long signr, int exit_co
const struct cred *old_cred;
struct cred *cred;
int retval = 0;
- int flag = 0;
int ispipe;
- bool need_nonrelative = false;
+ /* require nonrelative corefile path and be extra careful */
+ bool need_suid_safe = false;
static atomic_t core_dump_count = ATOMIC_INIT(0);
struct coredump_params cprm = {
.signr = signr,
@@ -2169,9 +2169,8 @@ void do_coredump(long signr, int exit_co
*/
if (__get_dumpable(cprm.mm_flags) == 2) {
/* Setuid core dump mode */
- flag = O_EXCL; /* Stop rewrite attacks */
cred->fsuid = 0; /* Dump root private */
- need_nonrelative = true;
+ need_suid_safe = true;
}

retval = coredump_wait(exit_code, &core_state);
@@ -2251,7 +2250,7 @@ void do_coredump(long signr, int exit_co
if (cprm.limit < binfmt->min_coredump)
goto fail_unlock;

- if (need_nonrelative && cn.corename[0] != '/') {
+ if (need_suid_safe && cn.corename[0] != '/') {
printk(KERN_WARNING "Pid %d(%s) can only dump core "\
"to fully qualified path!\n",
task_tgid_vnr(current), current->comm);
@@ -2259,8 +2258,35 @@ void do_coredump(long signr, int exit_co
goto fail_unlock;
}

+ /*
+ * Unlink the file if it exists unless this is a SUID
+ * binary - in that case, we're running around with root
+ * privs and don't want to unlink another user's coredump.
+ */
+ if (!need_suid_safe) {
+ mm_segment_t old_fs;
+
+ old_fs = get_fs();
+ set_fs(KERNEL_DS);
+ /*
+ * If it doesn't exist, that's fine. If there's some
+ * other problem, we'll catch it at the filp_open().
+ */
+ (void) sys_unlink((const char __user *)cn.corename);
+ set_fs(old_fs);
+ }
+
+ /*
+ * There is a race between unlinking and creating the
+ * file, but if that causes an EEXIST here, that's
+ * fine - another process raced with us while creating
+ * the corefile, and the other process won. To userspace,
+ * what matters is that at least one of the two processes
+ * writes its coredump successfully, not which one.
+ */
cprm.file = filp_open(cn.corename,
- O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
+ O_CREAT | 2 | O_NOFOLLOW |
+ O_LARGEFILE | O_EXCL,
0600);
if (IS_ERR(cprm.file))
goto fail_unlock;

2015-11-24 22:36:58

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 11/52] Btrfs: fix file corruption and data loss after cloning inline extents

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Filipe Manana <[email protected]>

commit 8039d87d9e473aeb740d4fdbd59b9d2f89b2ced9 upstream.

Currently the clone ioctl allows to clone an inline extent from one file
to another that already has other (non-inlined) extents. This is a problem
because btrfs is not designed to deal with files having inline and regular
extents, if a file has an inline extent then it must be the only extent
in the file and must start at file offset 0. Having a file with an inline
extent followed by regular extents results in EIO errors when doing reads
or writes against the first 4K of the file.

Also, the clone ioctl allows one to lose data if the source file consists
of a single inline extent, with a size of N bytes, and the destination
file consists of a single inline extent with a size of M bytes, where we
have M > N. In this case the clone operation removes the inline extent
from the destination file and then copies the inline extent from the
source file into the destination file - we lose the M - N bytes from the
destination file, a read operation will get the value 0x00 for any bytes
in the the range [N, M] (the destination inode's i_size remained as M,
that's why we can read past N bytes).

So fix this by not allowing such destructive operations to happen and
return errno EOPNOTSUPP to user space.

Currently the fstest btrfs/035 tests the data loss case but it totally
ignores this - i.e. expects the operation to succeed and does not check
the we got data loss.

The following test case for fstests exercises all these cases that result
in file corruption and data loss:

seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15

_cleanup()
{
rm -f $tmp.*
}

# get standard environment, filters and checks
. ./common/rc
. ./common/filter

# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_cloner
_require_btrfs_fs_feature "no_holes"
_require_btrfs_mkfs_feature "no-holes"

rm -f $seqres.full

test_cloning_inline_extents()
{
local mkfs_opts=$1
local mount_opts=$2

_scratch_mkfs $mkfs_opts >>$seqres.full 2>&1
_scratch_mount $mount_opts

# File bar, the source for all the following clone operations, consists
# of a single inline extent (50 bytes).
$XFS_IO_PROG -f -c "pwrite -S 0xbb 0 50" $SCRATCH_MNT/bar \
| _filter_xfs_io

# Test cloning into a file with an extent (non-inlined) where the
# destination offset overlaps that extent. It should not be possible to
# clone the inline extent from file bar into this file.
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 16K" $SCRATCH_MNT/foo \
| _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo

# Doing IO against any range in the first 4K of the file should work.
# Due to a past clone ioctl bug which allowed cloning the inline extent,
# these operations resulted in EIO errors.
echo "File foo data after clone operation:"
# All bytes should have the value 0xaa (clone operation failed and did
# not modify our file).
od -t x1 $SCRATCH_MNT/foo
$XFS_IO_PROG -c "pwrite -S 0xcc 0 100" $SCRATCH_MNT/foo | _filter_xfs_io

# Test cloning the inline extent against a file which has a hole in its
# first 4K followed by a non-inlined extent. It should not be possible
# as well to clone the inline extent from file bar into this file.
$XFS_IO_PROG -f -c "pwrite -S 0xdd 4K 12K" $SCRATCH_MNT/foo2 \
| _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo2

# Doing IO against any range in the first 4K of the file should work.
# Due to a past clone ioctl bug which allowed cloning the inline extent,
# these operations resulted in EIO errors.
echo "File foo2 data after clone operation:"
# All bytes should have the value 0x00 (clone operation failed and did
# not modify our file).
od -t x1 $SCRATCH_MNT/foo2
$XFS_IO_PROG -c "pwrite -S 0xee 0 90" $SCRATCH_MNT/foo2 | _filter_xfs_io

# Test cloning the inline extent against a file which has a size of zero
# but has a prealloc extent. It should not be possible as well to clone
# the inline extent from file bar into this file.
$XFS_IO_PROG -f -c "falloc -k 0 1M" $SCRATCH_MNT/foo3 | _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo3

# Doing IO against any range in the first 4K of the file should work.
# Due to a past clone ioctl bug which allowed cloning the inline extent,
# these operations resulted in EIO errors.
echo "First 50 bytes of foo3 after clone operation:"
# Should not be able to read any bytes, file has 0 bytes i_size (the
# clone operation failed and did not modify our file).
od -t x1 $SCRATCH_MNT/foo3
$XFS_IO_PROG -c "pwrite -S 0xff 0 90" $SCRATCH_MNT/foo3 | _filter_xfs_io

# Test cloning the inline extent against a file which consists of a
# single inline extent that has a size not greater than the size of
# bar's inline extent (40 < 50).
# It should be possible to do the extent cloning from bar to this file.
$XFS_IO_PROG -f -c "pwrite -S 0x01 0 40" $SCRATCH_MNT/foo4 \
| _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo4

# Doing IO against any range in the first 4K of the file should work.
echo "File foo4 data after clone operation:"
# Must match file bar's content.
od -t x1 $SCRATCH_MNT/foo4
$XFS_IO_PROG -c "pwrite -S 0x02 0 90" $SCRATCH_MNT/foo4 | _filter_xfs_io

# Test cloning the inline extent against a file which consists of a
# single inline extent that has a size greater than the size of bar's
# inline extent (60 > 50).
# It should not be possible to clone the inline extent from file bar
# into this file.
$XFS_IO_PROG -f -c "pwrite -S 0x03 0 60" $SCRATCH_MNT/foo5 \
| _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo5

# Reading the file should not fail.
echo "File foo5 data after clone operation:"
# Must have a size of 60 bytes, with all bytes having a value of 0x03
# (the clone operation failed and did not modify our file).
od -t x1 $SCRATCH_MNT/foo5

# Test cloning the inline extent against a file which has no extents but
# has a size greater than bar's inline extent (16K > 50).
# It should not be possible to clone the inline extent from file bar
# into this file.
$XFS_IO_PROG -f -c "truncate 16K" $SCRATCH_MNT/foo6 | _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo6

# Reading the file should not fail.
echo "File foo6 data after clone operation:"
# Must have a size of 16K, with all bytes having a value of 0x00 (the
# clone operation failed and did not modify our file).
od -t x1 $SCRATCH_MNT/foo6

# Test cloning the inline extent against a file which has no extents but
# has a size not greater than bar's inline extent (30 < 50).
# It should be possible to clone the inline extent from file bar into
# this file.
$XFS_IO_PROG -f -c "truncate 30" $SCRATCH_MNT/foo7 | _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo7

# Reading the file should not fail.
echo "File foo7 data after clone operation:"
# Must have a size of 50 bytes, with all bytes having a value of 0xbb.
od -t x1 $SCRATCH_MNT/foo7

# Test cloning the inline extent against a file which has a size not
# greater than the size of bar's inline extent (20 < 50) but has
# a prealloc extent that goes beyond the file's size. It should not be
# possible to clone the inline extent from bar into this file.
$XFS_IO_PROG -f -c "falloc -k 0 1M" \
-c "pwrite -S 0x88 0 20" \
$SCRATCH_MNT/foo8 | _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo8

echo "File foo8 data after clone operation:"
# Must have a size of 20 bytes, with all bytes having a value of 0x88
# (the clone operation did not modify our file).
od -t x1 $SCRATCH_MNT/foo8

_scratch_unmount
}

echo -e "\nTesting without compression and without the no-holes feature...\n"
test_cloning_inline_extents

echo -e "\nTesting with compression and without the no-holes feature...\n"
test_cloning_inline_extents "" "-o compress"

echo -e "\nTesting without compression and with the no-holes feature...\n"
test_cloning_inline_extents "-O no-holes" ""

echo -e "\nTesting with compression and with the no-holes feature...\n"
test_cloning_inline_extents "-O no-holes" "-o compress"

status=0
exit

Signed-off-by: Filipe Manana <[email protected]>
[bwh: Backported to 3.2:
- Adjust parameters to btrfs_drop_extents()
- Drop use of ASSERT()
- Keep using BUG_ON() for other error cases, as there is no
btrfs_abort_transaction()
- Adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/btrfs/ioctl.c | 195 +++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 152 insertions(+), 43 deletions(-)

--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2164,6 +2164,151 @@ out:
return ret;
}

+/*
+ * Make sure we do not end up inserting an inline extent into a file that has
+ * already other (non-inline) extents. If a file has an inline extent it can
+ * not have any other extents and the (single) inline extent must start at the
+ * file offset 0. Failing to respect these rules will lead to file corruption,
+ * resulting in EIO errors on read/write operations, hitting BUG_ON's in mm, etc
+ *
+ * We can have extents that have been already written to disk or we can have
+ * dirty ranges still in delalloc, in which case the extent maps and items are
+ * created only when we run delalloc, and the delalloc ranges might fall outside
+ * the range we are currently locking in the inode's io tree. So we check the
+ * inode's i_size because of that (i_size updates are done while holding the
+ * i_mutex, which we are holding here).
+ * We also check to see if the inode has a size not greater than "datal" but has
+ * extents beyond it, due to an fallocate with FALLOC_FL_KEEP_SIZE (and we are
+ * protected against such concurrent fallocate calls by the i_mutex).
+ *
+ * If the file has no extents but a size greater than datal, do not allow the
+ * copy because we would need turn the inline extent into a non-inline one (even
+ * with NO_HOLES enabled). If we find our destination inode only has one inline
+ * extent, just overwrite it with the source inline extent if its size is less
+ * than the source extent's size, or we could copy the source inline extent's
+ * data into the destination inode's inline extent if the later is greater then
+ * the former.
+ */
+static int clone_copy_inline_extent(struct inode *src,
+ struct inode *dst,
+ struct btrfs_trans_handle *trans,
+ struct btrfs_path *path,
+ struct btrfs_key *new_key,
+ const u64 drop_start,
+ const u64 datal,
+ const u64 skip,
+ const u64 size,
+ char *inline_data)
+{
+ struct btrfs_root *root = BTRFS_I(dst)->root;
+ const u64 aligned_end = ALIGN(new_key->offset + datal,
+ root->sectorsize);
+ int ret;
+ struct btrfs_key key;
+ u64 hint_byte;
+
+ if (new_key->offset > 0)
+ return -EOPNOTSUPP;
+
+ key.objectid = btrfs_ino(dst);
+ key.type = BTRFS_EXTENT_DATA_KEY;
+ key.offset = 0;
+ ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+ if (ret < 0) {
+ return ret;
+ } else if (ret > 0) {
+ if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
+ ret = btrfs_next_leaf(root, path);
+ if (ret < 0)
+ return ret;
+ else if (ret > 0)
+ goto copy_inline_extent;
+ }
+ btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+ if (key.objectid == btrfs_ino(dst) &&
+ key.type == BTRFS_EXTENT_DATA_KEY) {
+ return -EOPNOTSUPP;
+ }
+ } else if (i_size_read(dst) <= datal) {
+ struct btrfs_file_extent_item *ei;
+ u64 ext_len;
+
+ /*
+ * If the file size is <= datal, make sure there are no other
+ * extents following (can happen do to an fallocate call with
+ * the flag FALLOC_FL_KEEP_SIZE).
+ */
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ /*
+ * If it's an inline extent, it can not have other extents
+ * following it.
+ */
+ if (btrfs_file_extent_type(path->nodes[0], ei) ==
+ BTRFS_FILE_EXTENT_INLINE)
+ goto copy_inline_extent;
+
+ ext_len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
+ if (ext_len > aligned_end)
+ return -EOPNOTSUPP;
+
+ ret = btrfs_next_item(root, path);
+ if (ret < 0) {
+ return ret;
+ } else if (ret == 0) {
+ btrfs_item_key_to_cpu(path->nodes[0], &key,
+ path->slots[0]);
+ if (key.objectid == btrfs_ino(dst) &&
+ key.type == BTRFS_EXTENT_DATA_KEY)
+ return -EOPNOTSUPP;
+ }
+ }
+
+copy_inline_extent:
+ /*
+ * We have no extent items, or we have an extent at offset 0 which may
+ * or may not be inlined. All these cases are dealt the same way.
+ */
+ if (i_size_read(dst) > datal) {
+ /*
+ * If the destination inode has an inline extent...
+ * This would require copying the data from the source inline
+ * extent into the beginning of the destination's inline extent.
+ * But this is really complex, both extents can be compressed
+ * or just one of them, which would require decompressing and
+ * re-compressing data (which could increase the new compressed
+ * size, not allowing the compressed data to fit anymore in an
+ * inline extent).
+ * So just don't support this case for now (it should be rare,
+ * we are not really saving space when cloning inline extents).
+ */
+ return -EOPNOTSUPP;
+ }
+
+ btrfs_release_path(path);
+ ret = btrfs_drop_extents(trans, dst, drop_start, aligned_end,
+ &hint_byte, 1);
+ if (ret)
+ return ret;
+ ret = btrfs_insert_empty_item(trans, root, path, new_key, size);
+ if (ret)
+ return ret;
+
+ if (skip) {
+ const u32 start = btrfs_file_extent_calc_inline_size(0);
+
+ memmove(inline_data + start, inline_data + start + skip, datal);
+ }
+
+ write_extent_buffer(path->nodes[0], inline_data,
+ btrfs_item_ptr_offset(path->nodes[0],
+ path->slots[0]),
+ size);
+ inode_add_bytes(dst, datal);
+
+ return 0;
+}
+
static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
u64 off, u64 olen, u64 destoff)
{
@@ -2448,20 +2593,6 @@ static noinline long btrfs_ioctl_clone(s
new_key.offset += skip;
}

- /*
- * Don't copy an inline extent into an offset
- * greater than zero. Having an inline extent
- * at such an offset results in chaos as btrfs
- * isn't prepared for such cases. Just skip
- * this case for the same reasons as commented
- * at btrfs_ioctl_clone().
- */
- if (new_key.offset > 0) {
- ret = -EOPNOTSUPP;
- btrfs_end_transaction(trans, root);
- goto out;
- }
-
if (key.offset + datal > off+len)
trim = key.offset + datal - (off+len);

@@ -2473,29 +2604,20 @@ static noinline long btrfs_ioctl_clone(s
size -= skip + trim;
datal -= skip + trim;

- ret = btrfs_drop_extents(trans, inode,
- new_key.offset,
- new_key.offset + datal,
- &hint_byte, 1);
- BUG_ON(ret);
-
- ret = btrfs_insert_empty_item(trans, root, path,
- &new_key, size);
- BUG_ON(ret);
-
- if (skip) {
- u32 start =
- btrfs_file_extent_calc_inline_size(0);
- memmove(buf+start, buf+start+skip,
- datal);
+ ret = clone_copy_inline_extent(src, inode,
+ trans, path,
+ &new_key,
+ new_key.offset,
+ datal,
+ skip, size, buf);
+ if (ret) {
+ BUG_ON(ret != -EOPNOTSUPP);
+ btrfs_end_transaction(trans, root);
+ goto out;
}

leaf = path->nodes[0];
slot = path->slots[0];
- write_extent_buffer(leaf, buf,
- btrfs_item_ptr_offset(leaf, slot),
- size);
- inode_add_bytes(inode, datal);
}

btrfs_mark_buffer_dirty(leaf);

2015-11-24 22:37:10

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 15/52] Btrfs: fix truncation of compressed and inlined extents

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Filipe Manana <[email protected]>

commit 0305cd5f7fca85dae392b9ba85b116896eb7c1c7 upstream.

When truncating a file to a smaller size which consists of an inline
extent that is compressed, we did not discard (or made unusable) the
data between the new file size and the old file size, wasting metadata
space and allowing for the truncated data to be leaked and the data
corruption/loss mentioned below.
We were also not correctly decrementing the number of bytes used by the
inode, we were setting it to zero, giving a wrong report for callers of
the stat(2) syscall. The fsck tool also reported an error about a mismatch
between the nbytes of the file versus the real space used by the file.

Now because we weren't discarding the truncated region of the file, it
was possible for a caller of the clone ioctl to actually read the data
that was truncated, allowing for a security breach without requiring root
access to the system, using only standard filesystem operations. The
scenario is the following:

1) User A creates a file which consists of an inline and compressed
extent with a size of 2000 bytes - the file is not accessible to
any other users (no read, write or execution permission for anyone
else);

2) The user truncates the file to a size of 1000 bytes;

3) User A makes the file world readable;

4) User B creates a file consisting of an inline extent of 2000 bytes;

5) User B issues a clone operation from user A's file into its own
file (using a length argument of 0, clone the whole range);

6) User B now gets to see the 1000 bytes that user A truncated from
its file before it made its file world readbale. User B also lost
the bytes in the range [1000, 2000[ bytes from its own file, but
that might be ok if his/her intention was reading stale data from
user A that was never supposed to be public.

Note that this contrasts with the case where we truncate a file from 2000
bytes to 1000 bytes and then truncate it back from 1000 to 2000 bytes. In
this case reading any byte from the range [1000, 2000[ will return a value
of 0x00, instead of the original data.

This problem exists since the clone ioctl was added and happens both with
and without my recent data loss and file corruption fixes for the clone
ioctl (patch "Btrfs: fix file corruption and data loss after cloning
inline extents").

So fix this by truncating the compressed inline extents as we do for the
non-compressed case, which involves decompressing, if the data isn't already
in the page cache, compressing the truncated version of the extent, writing
the compressed content into the inline extent and then truncate it.

The following test case for fstests reproduces the problem. In order for
the test to pass both this fix and my previous fix for the clone ioctl
that forbids cloning a smaller inline extent into a larger one,
which is titled "Btrfs: fix file corruption and data loss after cloning
inline extents", are needed. Without that other fix the test fails in a
different way that does not leak the truncated data, instead part of
destination file gets replaced with zeroes (because the destination file
has a larger inline extent than the source).

seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15

_cleanup()
{
rm -f $tmp.*
}

# get standard environment, filters and checks
. ./common/rc
. ./common/filter

# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_cloner

rm -f $seqres.full

_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount "-o compress"

# Create our test files. File foo is going to be the source of a clone operation
# and consists of a single inline extent with an uncompressed size of 512 bytes,
# while file bar consists of a single inline extent with an uncompressed size of
# 256 bytes. For our test's purpose, it's important that file bar has an inline
# extent with a size smaller than foo's inline extent.
$XFS_IO_PROG -f -c "pwrite -S 0xa1 0 128" \
-c "pwrite -S 0x2a 128 384" \
$SCRATCH_MNT/foo | _filter_xfs_io
$XFS_IO_PROG -f -c "pwrite -S 0xbb 0 256" $SCRATCH_MNT/bar | _filter_xfs_io

# Now durably persist all metadata and data. We do this to make sure that we get
# on disk an inline extent with a size of 512 bytes for file foo.
sync

# Now truncate our file foo to a smaller size. Because it consists of a
# compressed and inline extent, btrfs did not shrink the inline extent to the
# new size (if the extent was not compressed, btrfs would shrink it to 128
# bytes), it only updates the inode's i_size to 128 bytes.
$XFS_IO_PROG -c "truncate 128" $SCRATCH_MNT/foo

# Now clone foo's inline extent into bar.
# This clone operation should fail with errno EOPNOTSUPP because the source
# file consists only of an inline extent and the file's size is smaller than
# the inline extent of the destination (128 bytes < 256 bytes). However the
# clone ioctl was not prepared to deal with a file that has a size smaller
# than the size of its inline extent (something that happens only for compressed
# inline extents), resulting in copying the full inline extent from the source
# file into the destination file.
#
# Note that btrfs' clone operation for inline extents consists of removing the
# inline extent from the destination inode and copy the inline extent from the
# source inode into the destination inode, meaning that if the destination
# inode's inline extent is larger (N bytes) than the source inode's inline
# extent (M bytes), some bytes (N - M bytes) will be lost from the destination
# file. Btrfs could copy the source inline extent's data into the destination's
# inline extent so that we would not lose any data, but that's currently not
# done due to the complexity that would be needed to deal with such cases
# (specially when one or both extents are compressed), returning EOPNOTSUPP, as
# it's normally not a very common case to clone very small files (only case
# where we get inline extents) and copying inline extents does not save any
# space (unlike for normal, non-inlined extents).
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar

# Now because the above clone operation used to succeed, and due to foo's inline
# extent not being shinked by the truncate operation, our file bar got the whole
# inline extent copied from foo, making us lose the last 128 bytes from bar
# which got replaced by the bytes in range [128, 256[ from foo before foo was
# truncated - in other words, data loss from bar and being able to read old and
# stale data from foo that should not be possible to read anymore through normal
# filesystem operations. Contrast with the case where we truncate a file from a
# size N to a smaller size M, truncate it back to size N and then read the range
# [M, N[, we should always get the value 0x00 for all the bytes in that range.

# We expected the clone operation to fail with errno EOPNOTSUPP and therefore
# not modify our file's bar data/metadata. So its content should be 256 bytes
# long with all bytes having the value 0xbb.
#
# Without the btrfs bug fix, the clone operation succeeded and resulted in
# leaking truncated data from foo, the bytes that belonged to its range
# [128, 256[, and losing data from bar in that same range. So reading the
# file gave us the following content:
#
# 0000000 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1
# *
# 0000200 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a
# *
# 0000400
echo "File bar's content after the clone operation:"
od -t x1 $SCRATCH_MNT/bar

# Also because the foo's inline extent was not shrunk by the truncate
# operation, btrfs' fsck, which is run by the fstests framework everytime a
# test completes, failed reporting the following error:
#
# root 5 inode 257 errors 400, nbytes wrong

status=0
exit

Signed-off-by: Filipe Manana <[email protected]>
[bwh: Backported to 3.2:
- Adjust parameters to btrfs_truncate_page() and btrfs_truncate_item()
- Pass transaction pointer into truncate_inline_extent()
- Add prototype of btrfs_truncate_page()
- s/test_bit(BTRFS_ROOT_REF_COWS, &root->state)/root->ref_cows/
- Keep using BUG_ON() for other error cases, as there is no
btrfs_abort_transaction()
- Adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -88,6 +88,7 @@ static unsigned char btrfs_type_by_mode[
};

static int btrfs_setsize(struct inode *inode, loff_t newsize);
+static int btrfs_truncate_page(struct address_space *mapping, loff_t from);
static int btrfs_truncate(struct inode *inode);
static int btrfs_finish_ordered_io(struct inode *inode, u64 start, u64 end);
static noinline int cow_file_range(struct inode *inode,
@@ -2992,6 +2993,47 @@ out:
return err;
}

+static int truncate_inline_extent(struct btrfs_trans_handle *trans,
+ struct inode *inode,
+ struct btrfs_path *path,
+ struct btrfs_key *found_key,
+ const u64 item_end,
+ const u64 new_size)
+{
+ struct extent_buffer *leaf = path->nodes[0];
+ int slot = path->slots[0];
+ struct btrfs_file_extent_item *fi;
+ u32 size = (u32)(new_size - found_key->offset);
+ struct btrfs_root *root = BTRFS_I(inode)->root;
+
+ fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
+
+ if (btrfs_file_extent_compression(leaf, fi) != BTRFS_COMPRESS_NONE) {
+ loff_t offset = new_size;
+
+ /*
+ * Zero out the remaining of the last page of our inline extent,
+ * instead of directly truncating our inline extent here - that
+ * would be much more complex (decompressing all the data, then
+ * compressing the truncated data, which might be bigger than
+ * the size of the inline extent, resize the extent, etc).
+ * We release the path because to get the page we might need to
+ * read the extent item from disk (data not in the page cache).
+ */
+ btrfs_release_path(path);
+ return btrfs_truncate_page(inode->i_mapping, offset);
+ }
+
+ btrfs_set_file_extent_ram_bytes(leaf, fi, size);
+ size = btrfs_file_extent_calc_inline_size(size);
+ btrfs_truncate_item(trans, root, path, size, 1);
+
+ if (root->ref_cows)
+ inode_sub_bytes(inode, item_end + 1 - new_size);
+
+ return 0;
+}
+
/*
* this can truncate away extent items, csum items and directory items.
* It starts at a high offset and removes keys until it can't find
@@ -3153,28 +3195,30 @@ search_again:
* special encodings
*/
if (!del_item &&
- btrfs_file_extent_compression(leaf, fi) == 0 &&
btrfs_file_extent_encryption(leaf, fi) == 0 &&
btrfs_file_extent_other_encoding(leaf, fi) == 0) {
- u32 size = new_size - found_key.offset;
-
- if (root->ref_cows) {
- inode_sub_bytes(inode, item_end + 1 -
- new_size);
- }

/*
- * update the ram bytes to properly reflect
- * the new size of our item
+ * Need to release path in order to truncate a
+ * compressed extent. So delete any accumulated
+ * extent items so far.
*/
- btrfs_set_file_extent_ram_bytes(leaf, fi, size);
- size =
- btrfs_file_extent_calc_inline_size(size);
- ret = btrfs_truncate_item(trans, root, path,
- size, 1);
+ if (btrfs_file_extent_compression(leaf, fi) !=
+ BTRFS_COMPRESS_NONE && pending_del_nr) {
+ err = btrfs_del_items(trans, root, path,
+ pending_del_slot,
+ pending_del_nr);
+ BUG_ON(err);
+ pending_del_nr = 0;
+ }
+
+ err = truncate_inline_extent(trans, inode,
+ path, &found_key,
+ item_end,
+ new_size);
+ BUG_ON(err);
} else if (root->ref_cows) {
- inode_sub_bytes(inode, item_end + 1 -
- found_key.offset);
+ inode_sub_bytes(inode, item_end + 1 - new_size);
}
}
delete:

2015-11-24 22:37:15

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 44/52] FS-Cache: Handle a write to the page immediately beyond the EOF marker

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: David Howells <[email protected]>

commit 102f4d900c9c8f5ed89ae4746d493fe3ebd7ba64 upstream.

Handle a write being requested to the page immediately beyond the EOF
marker on a cache object. Currently this gets an assertion failure in
CacheFiles because the EOF marker is used there to encode information about
a partial page at the EOF - which could lead to an unknown blank spot in
the file if we extend the file over it.

The problem is actually in fscache where we check the index of the page
being written against store_limit. store_limit is set to the number of
pages that we're allowed to store by fscache_set_store_limit() - which
means it's one more than the index of the last page we're allowed to store.
The problem is that we permit writing to a page with an index _equal_ to
the store limit - when we should reject that case.

Whilst we're at it, change the triggered assertion in CacheFiles to just
return -ENOBUFS instead.

The assertion failure looks something like this:

CacheFiles: Assertion failed
1000 < 7b1 is false
------------[ cut here ]------------
kernel BUG at fs/cachefiles/rdwr.c:962!
...
RIP: 0010:[<ffffffffa02c9e83>] [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles]

Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>
[bwh: Backported to 3.2: we don't have __kernel_write() so keep using the
open-coded equivalent]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/cachefiles/rdwr.c | 67 +++++++++++++++++++++++++++++-----------------------
fs/fscache/page.c | 2 +-
2 files changed, 38 insertions(+), 31 deletions(-)

--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -914,6 +914,15 @@ int cachefiles_write_page(struct fscache
cache = container_of(object->fscache.cache,
struct cachefiles_cache, cache);

+ pos = (loff_t)page->index << PAGE_SHIFT;
+
+ /* We mustn't write more data than we have, so we have to beware of a
+ * partial page at EOF.
+ */
+ eof = object->fscache.store_limit_l;
+ if (pos >= eof)
+ goto error;
+
/* write the page to the backing filesystem and let it store it in its
* own time */
dget(object->backer);
@@ -922,47 +931,46 @@ int cachefiles_write_page(struct fscache
cache->cache_cred);
if (IS_ERR(file)) {
ret = PTR_ERR(file);
- } else {
+ goto error_2;
+ }
+ if (!file->f_op->write) {
ret = -EIO;
- if (file->f_op->write) {
- pos = (loff_t) page->index << PAGE_SHIFT;
-
- /* we mustn't write more data than we have, so we have
- * to beware of a partial page at EOF */
- eof = object->fscache.store_limit_l;
- len = PAGE_SIZE;
- if (eof & ~PAGE_MASK) {
- ASSERTCMP(pos, <, eof);
- if (eof - pos < PAGE_SIZE) {
- _debug("cut short %llx to %llx",
- pos, eof);
- len = eof - pos;
- ASSERTCMP(pos + len, ==, eof);
- }
- }
-
- data = kmap(page);
- old_fs = get_fs();
- set_fs(KERNEL_DS);
- ret = file->f_op->write(
- file, (const void __user *) data, len, &pos);
- set_fs(old_fs);
- kunmap(page);
- if (ret != len)
- ret = -EIO;
- }
- fput(file);
+ goto error_2;
}

- if (ret < 0) {
- if (ret == -EIO)
- cachefiles_io_error_obj(
- object, "Write page to backing file failed");
- ret = -ENOBUFS;
+ len = PAGE_SIZE;
+ if (eof & ~PAGE_MASK) {
+ if (eof - pos < PAGE_SIZE) {
+ _debug("cut short %llx to %llx",
+ pos, eof);
+ len = eof - pos;
+ ASSERTCMP(pos + len, ==, eof);
+ }
}

- _leave(" = %d", ret);
- return ret;
+ data = kmap(page);
+ old_fs = get_fs();
+ set_fs(KERNEL_DS);
+ ret = file->f_op->write(
+ file, (const void __user *) data, len, &pos);
+ set_fs(old_fs);
+ kunmap(page);
+ fput(file);
+ if (ret != len)
+ goto error_eio;
+
+ _leave(" = 0");
+ return 0;
+
+error_eio:
+ ret = -EIO;
+error_2:
+ if (ret == -EIO)
+ cachefiles_io_error_obj(object,
+ "Write page to backing file failed");
+error:
+ _leave(" = -ENOBUFS [%d]", ret);
+ return -ENOBUFS;
}

/*
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -676,7 +676,7 @@ static void fscache_write_op(struct fsca
goto superseded;
page = results[0];
_debug("gang %d [%lx]", n, page->index);
- if (page->index > op->store_limit) {
+ if (page->index >= op->store_limit) {
fscache_stat(&fscache_n_store_pages_over_limit);
goto superseded;
}

2015-11-24 22:37:20

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 49/52] RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Sowmini Varadhan <[email protected]>

[ Upstream commit 8ce675ff39b9958d1c10f86cf58e357efaafc856 ]

Either of pskb_pull() or pskb_trim() may fail under low memory conditions.
If rds_tcp_data_recv() ignores such failures, the application will
receive corrupted data because the skb has not been correctly
carved to the RDS datagram size.

Avoid this by handling pskb_pull/pskb_trim failure in the same
manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and
retry via the deferred call to rds_send_worker() that gets set up on
ENOMEM from rds_tcp_read_sock()

Signed-off-by: Sowmini Varadhan <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/rds/tcp_recv.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

--- a/net/rds/tcp_recv.c
+++ b/net/rds/tcp_recv.c
@@ -235,8 +235,15 @@ static int rds_tcp_data_recv(read_descri
}

to_copy = min(tc->t_tinc_data_rem, left);
- pskb_pull(clone, offset);
- pskb_trim(clone, to_copy);
+ if (!pskb_pull(clone, offset) ||
+ pskb_trim(clone, to_copy)) {
+ pr_warn("rds_tcp_data_recv: pull/trim failed "
+ "left %zu data_rem %zu skb_len %d\n",
+ left, tc->t_tinc_data_rem, skb->len);
+ kfree_skb(clone);
+ desc->error = -ENOMEM;
+ goto out;
+ }
skb_queue_tail(&tinc->ti_skb_list, clone);

rdsdebug("skb %p data %p len %d off %u to_copy %zu -> "

2015-11-24 22:37:07

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 30/52] ALSA: hda - Apply pin fixup for HP ProBook 6550b

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit c932b98c1e47312822d911c1bb76e81ef50e389c upstream.

HP ProBook 6550b needs the same pin fixup applied to other HP B-series
laptops with docks for making its headphone and dock headphone jacks
working properly. We just need to add the codec SSID to the list.

Bugzilla: https://bugzilla.kernel.org/attachment.cgi?id=191971
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
sound/pci/hda/patch_sigmatel.c | 1 +
1 file changed, 1 insertion(+)

--- a/sound/pci/hda/patch_sigmatel.c
+++ b/sound/pci/hda/patch_sigmatel.c
@@ -4980,6 +4980,7 @@ static int find_mute_led_gpio(struct hda
static int hp_blike_system(u32 subsystem_id)
{
switch (subsystem_id) {
+ case 0x103c1473: /* HP ProBook 6550b */
case 0x103c1520:
case 0x103c1521:
case 0x103c1523:

2015-11-24 22:37:04

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 29/52] ipv6: fix tunnel error handling

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Michal Kubeček <[email protected]>

commit ebac62fe3d24c0ce22dd83afa7b07d1a2aaef44d upstream.

Both tunnel6_protocol and tunnel46_protocol share the same error
handler, tunnel6_err(), which traverses through tunnel6_handlers list.
For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g.
in tunnel46_rcv(). Current code can generate an ICMPv6 error message
with an IPv4 packet embedded in it.

Fixes: 73d605d1abbd ("[IPSEC]: changing API of xfrm6_tunnel_register")
Signed-off-by: Michal Kubecek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/ipv6/tunnel6.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

--- a/net/ipv6/tunnel6.c
+++ b/net/ipv6/tunnel6.c
@@ -145,6 +145,16 @@ static void tunnel6_err(struct sk_buff *
break;
}

+static void tunnel46_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
+ u8 type, u8 code, int offset, __be32 info)
+{
+ struct xfrm6_tunnel *handler;
+
+ for_each_tunnel_rcu(tunnel46_handlers, handler)
+ if (!handler->err_handler(skb, opt, type, code, offset, info))
+ break;
+}
+
static const struct inet6_protocol tunnel6_protocol = {
.handler = tunnel6_rcv,
.err_handler = tunnel6_err,
@@ -153,7 +163,7 @@ static const struct inet6_protocol tunne

static const struct inet6_protocol tunnel46_protocol = {
.handler = tunnel46_rcv,
- .err_handler = tunnel6_err,
+ .err_handler = tunnel46_err,
.flags = INET6_PROTO_NOPOLICY|INET6_PROTO_FINAL,
};

2015-11-24 22:36:45

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 45/52] binfmt_elf: Don't clobber passed executable's file header

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: "Maciej W. Rozycki" <[email protected]>

commit b582ef5c53040c5feef4c96a8f9585b6831e2441 upstream.

Do not clobber the buffer space passed from `search_binary_handler' and
originally preloaded by `prepare_binprm' with the executable's file
header by overwriting it with its interpreter's file header. Instead
keep the buffer space intact and directly use the data structure locally
allocated for the interpreter's file header, fixing a bug introduced in
2.1.14 with loadable module support (linux-mips.org commit beb11695
[Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
Adjust the amount of data read from the interpreter's file accordingly.

This was not an issue before loadable module support, because back then
`load_elf_binary' was executed only once for a given ELF executable,
whether the function succeeded or failed.

With loadable module support supported and enabled, upon a failure of
`load_elf_binary' -- which may for example be caused by architecture
code rejecting an executable due to a missing hardware feature requested
in the file header -- a module load is attempted and then the function
reexecuted by `search_binary_handler'. With the executable's file
header replaced with its interpreter's file header the executable can
then be erroneously accepted in this subsequent attempt.

Signed-off-by: Maciej W. Rozycki <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/binfmt_elf.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -671,16 +671,16 @@ static int load_elf_binary(struct linux_
*/
would_dump(bprm, interpreter);

- retval = kernel_read(interpreter, 0, bprm->buf,
- BINPRM_BUF_SIZE);
- if (retval != BINPRM_BUF_SIZE) {
+ /* Get the exec headers */
+ retval = kernel_read(interpreter, 0,
+ (void *)&loc->interp_elf_ex,
+ sizeof(loc->interp_elf_ex));
+ if (retval != sizeof(loc->interp_elf_ex)) {
if (retval >= 0)
retval = -EIO;
goto out_free_dentry;
}

- /* Get the exec headers */
- loc->interp_elf_ex = *((struct elfhdr *)bprm->buf);
break;
}
elf_ppnt++;

2015-11-24 22:36:35

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 39/52] scsi: Fix a bdi reregistration race

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Bart Van Assche <[email protected]>

commit bf2cf3baa20b0a6cd2d08707ef05dc0e992a8aa0 upstream.

Unregister and reregister BDI devices in the proper order. This patch
avoids that the following kernel warning can get triggered:

WARNING: CPU: 7 PID: 203 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x68/0x80()
sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:32'
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[<ffffffff814ff5a4>] dump_stack+0x4c/0x65
[<ffffffff810746ba>] warn_slowpath_common+0x8a/0xc0
[<ffffffff81074736>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81237ca8>] sysfs_warn_dup+0x68/0x80
[<ffffffff81237d8e>] sysfs_create_dir_ns+0x7e/0x90
[<ffffffff81291f58>] kobject_add_internal+0xa8/0x320
[<ffffffff812923a0>] kobject_add+0x60/0xb0
[<ffffffff8138c937>] device_add+0x107/0x5e0
[<ffffffff8138d018>] device_create_groups_vargs+0xd8/0x100
[<ffffffff8138d05c>] device_create_vargs+0x1c/0x20
[<ffffffff8117f233>] bdi_register+0x63/0x2a0
[<ffffffff8117f497>] bdi_register_dev+0x27/0x30
[<ffffffff81281549>] add_disk+0x1a9/0x4e0
[<ffffffffa00c5739>] sd_probe_async+0x119/0x1d0 [sd_mod]
[<ffffffff8109a81a>] async_run_entry_fn+0x4a/0x140
[<ffffffff81091078>] process_one_work+0x1d8/0x7c0
[<ffffffff81091774>] worker_thread+0x114/0x460
[<ffffffff81097878>] kthread+0xf8/0x110
[<ffffffff8150801f>] ret_from_fork+0x3f/0x70

See also patch "block: destroy bdi before blockdev is unregistered"
(commit ID 6cd18e711dd8).

Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/scsi/scsi_sysfs.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)

--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -955,9 +955,7 @@ void __scsi_remove_device(struct scsi_de
bsg_unregister_queue(sdev->request_queue);
device_unregister(&sdev->sdev_dev);
transport_remove_device(dev);
- device_del(dev);
- } else
- put_device(&sdev->sdev_dev);
+ }

/*
* Stop accepting new requests and wait until all queuecommand() and
@@ -968,6 +966,16 @@ void __scsi_remove_device(struct scsi_de
blk_cleanup_queue(sdev->request_queue);
cancel_work_sync(&sdev->requeue_work);

+ /*
+ * Remove the device after blk_cleanup_queue() has been called such
+ * a possible bdi_register() call with the same name occurs after
+ * blk_cleanup_queue() has called bdi_destroy().
+ */
+ if (sdev->is_visible)
+ device_del(dev);
+ else
+ put_device(&sdev->sdev_dev);
+
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);

2015-11-24 22:36:21

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 21/52] MIPS: atomic: Fix comment describing atomic64_add_unless's return value.

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Ralf Baechle <[email protected]>

commit f25319d2cb439249a6859f53ad42ffa332b0acba upstream.

Signed-off-by: Ralf Baechle <[email protected]>
Fixes: f24219b4e90cf70ec4a211b17fbabc725a0ddf3c
(cherry picked from commit f0a232cde7be18a207fd057dd79bbac8a0a45dec)
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/mips/include/asm/atomic.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -679,7 +679,7 @@ static __inline__ long atomic64_sub_if_p
* @u: ...unless v is equal to u.
*
* Atomically adds @a to @v, so long as it was not @u.
- * Returns the old value of @v.
+ * Returns true iff @v was not @u.
*/
static __inline__ int atomic64_add_unless(atomic64_t *v, long a, long u)
{

2015-11-24 22:36:15

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 20/52] ACPI: Use correct IRQ when uninstalling ACPI interrupt handler

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Chen Yu <[email protected]>

commit 49e4b84333f338d4f183f28f1f3c1131b9fb2b5a upstream.

Currently when the system is trying to uninstall the ACPI interrupt
handler, it uses acpi_gbl_FADT.sci_interrupt as the IRQ number.
However, the IRQ number that the ACPI interrupt handled is installed
for comes from acpi_gsi_to_irq() and that is the number that should
be used for the handler removal.

Fix this problem by using the mapped IRQ returned from acpi_gsi_to_irq()
as appropriate.

Acked-by: Lv Zheng <[email protected]>
Signed-off-by: Chen Yu <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/acpi/osl.c | 9 ++++++---
include/linux/acpi.h | 6 ++++++
2 files changed, 12 insertions(+), 3 deletions(-)

--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -82,6 +82,7 @@ static struct workqueue_struct *kacpid_w
static struct workqueue_struct *kacpi_notify_wq;
struct workqueue_struct *kacpi_hotplug_wq;
EXPORT_SYMBOL(kacpi_hotplug_wq);
+unsigned int acpi_sci_irq = INVALID_ACPI_IRQ;

struct acpi_res_list {
resource_size_t start;
@@ -566,17 +567,19 @@ acpi_os_install_interrupt_handler(u32 gs
acpi_irq_handler = NULL;
return AE_NOT_ACQUIRED;
}
+ acpi_sci_irq = irq;

return AE_OK;
}

-acpi_status acpi_os_remove_interrupt_handler(u32 irq, acpi_osd_handler handler)
+acpi_status acpi_os_remove_interrupt_handler(u32 gsi, acpi_osd_handler handler)
{
- if (irq != acpi_gbl_FADT.sci_interrupt)
+ if (gsi != acpi_gbl_FADT.sci_interrupt || !acpi_sci_irq_valid())
return AE_BAD_PARAMETER;

- free_irq(irq, acpi_irq);
+ free_irq(acpi_sci_irq, acpi_irq);
acpi_irq_handler = NULL;
+ acpi_sci_irq = INVALID_ACPI_IRQ;

return AE_OK;
}
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -110,6 +110,12 @@ int acpi_unregister_ioapic(acpi_handle h
void acpi_irq_stats_init(void);
extern u32 acpi_irq_handled;
extern u32 acpi_irq_not_handled;
+extern unsigned int acpi_sci_irq;
+#define INVALID_ACPI_IRQ ((unsigned)-1)
+static inline bool acpi_sci_irq_valid(void)
+{
+ return acpi_sci_irq != INVALID_ACPI_IRQ;
+}

extern int sbf_port;
extern unsigned long acpi_realmode_flags;

2015-11-24 22:36:18

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 22/52] ALSA: hda - Disable 64bit address for Creative HDA controllers

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <[email protected]>

commit cadd16ea33a938d49aee99edd4758cc76048b399 upstream.

We've had many reports that some Creative sound cards with CA0132
don't work well. Some reported that it starts working after reloading
the module, while some reported it starts working when a 32bit kernel
is used. All these facts seem implying that the chip fails to
communicate when the buffer is located in 64bit address.

This patch addresses these issues by just adding AZX_DCAPS_NO_64BIT
flag to the corresponding PCI entries. I casually had a chance to
test an SB Recon3D board, and indeed this seems helping.

Although this hasn't been tested on all Creative devices, it's safer
to assume that this restriction applies to the rest of them, too. So
the flag is applied to all Creative entries.

Signed-off-by: Takashi Iwai <[email protected]>
[bwh: Backported to 3.2: drop the change to AZX_DCAPS_PRESET_CTHDA]
Signed-off-by: Ben Hutchings <[email protected]>
---
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -3099,11 +3099,13 @@ static DEFINE_PCI_DEVICE_TABLE(azx_ids)
.class = PCI_CLASS_MULTIMEDIA_HD_AUDIO << 8,
.class_mask = 0xffffff,
.driver_data = AZX_DRIVER_CTX | AZX_DCAPS_CTX_WORKAROUND |
+ AZX_DCAPS_NO_64BIT |
AZX_DCAPS_RIRB_PRE_DELAY | AZX_DCAPS_POSFIX_LPIB },
#else
/* this entry seems still valid -- i.e. without emu20kx chip */
{ PCI_DEVICE(0x1102, 0x0009),
.driver_data = AZX_DRIVER_CTX | AZX_DCAPS_CTX_WORKAROUND |
+ AZX_DCAPS_NO_64BIT |
AZX_DCAPS_RIRB_PRE_DELAY | AZX_DCAPS_POSFIX_LPIB },
#endif
/* Vortex86MX */

2015-11-24 22:41:34

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 31/52] firewire: ohci: fix JMicron JMB38x IT context discovery

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Stefan Richter <[email protected]>

commit 100ceb66d5c40cc0c7018e06a9474302470be73c upstream.

Reported by Clifford and Craig for JMicron OHCI-1394 + SDHCI combo
controllers: Often or even most of the time, the controller is
initialized with the message "added OHCI v1.10 device as card 0, 4 IR +
0 IT contexts, quirks 0x10". With 0 isochronous transmit DMA contexts
(IT contexts), applications like audio output are impossible.

However, OHCI-1394 demands that at least 4 IT contexts are implemented
by the link layer controller, and indeed JMicron JMB38x do implement
four of them. Only their IsoXmitIntMask register is unreliable at early
access.

With my own JMB381 single function controller I found:
- I can reproduce the problem with a lower probability than Craig's.
- If I put a loop around the section which clears and reads
IsoXmitIntMask, then either the first or the second attempt will
return the correct initial mask of 0x0000000f. I never encountered
a case of needing more than a second attempt.
- Consequently, if I put a dummy reg_read(...IsoXmitIntMaskSet)
before the first write, the subsequent read will return the correct
result.
- If I merely ignore a wrong read result and force the known real
result, later isochronous transmit DMA usage works just fine.

So let's just fix this chip bug up by the latter method. Tested with
JMB381 on kernel 3.13 and 4.3.

Since OHCI-1394 generally requires 4 IT contexts at a minium, this
workaround is simply applied whenever the initial read of IsoXmitIntMask
returns 0, regardless whether it's a JMicron chip or not. I never heard
of this issue together with any other chip though.

I am not 100% sure that this fix works on the OHCI-1394 part of JMB380
and JMB388 combo controllers exactly the same as on the JMB381 single-
function controller, but so far I haven't had a chance to let an owner
of a combo chip run a patched kernel.

Strangely enough, IsoRecvIntMask is always reported correctly, even
though it is probed right before IsoXmitIntMask.

Reported-by: Clifford Dunn
Reported-by: Craig Moore <[email protected]>
Signed-off-by: Stefan Richter <[email protected]>
[bwh: Backported to 3.2: log with fw_notify() instead of ohci_notice()]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/firewire/ohci.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -3547,6 +3547,11 @@ static int __devinit pci_probe(struct pc

reg_write(ohci, OHCI1394_IsoXmitIntMaskSet, ~0);
ohci->it_context_support = reg_read(ohci, OHCI1394_IsoXmitIntMaskSet);
+ /* JMicron JMB38x often shows 0 at first read, just ignore it */
+ if (!ohci->it_context_support) {
+ fw_notify("overriding IsoXmitIntMask\n");
+ ohci->it_context_support = 0xf;
+ }
reg_write(ohci, OHCI1394_IsoXmitIntMaskClear, ~0);
ohci->it_context_mask = ohci->it_context_support;
ohci->n_it = hweight32(ohci->it_context_mask);

2015-11-24 22:41:38

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 28/52] recordmcount: Fix endianness handling bug for nop_mcount

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: libin <[email protected]>

commit c84da8b9ad3761eef43811181c7e896e9834b26b upstream.

In nop_mcount, shdr->sh_offset and welp->r_offset should handle
endianness properly, otherwise it will trigger Segmentation fault
if the recordmcount main and file.o have different endianness.

Link: http://lkml.kernel.org/r/[email protected]

Signed-off-by: Li Bin <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
scripts/recordmcount.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/scripts/recordmcount.h
+++ b/scripts/recordmcount.h
@@ -375,7 +375,7 @@ static void nop_mcount(Elf_Shdr const *c

if (mcountsym == Elf_r_sym(relp) && !is_fake_mcount(relp)) {
if (make_nop)
- ret = make_nop((void *)ehdr, shdr->sh_offset + relp->r_offset);
+ ret = make_nop((void *)ehdr, _w(shdr->sh_offset) + _w(relp->r_offset));
if (warn_on_notrace_sect && !once) {
printf("Section %s has mcount callers being ignored\n",
txtname);

2015-11-24 22:41:43

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 27/52] megaraid_sas : SMAP restriction--do not access user memory from IOCTL code

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: "[email protected]" <[email protected]>

commit 323c4a02c631d00851d8edc4213c4d184ef83647 upstream.

This is an issue on SMAP enabled CPUs and 32 bit apps running on 64 bit
OS. Do not access user memory from kernel code. The SMAP bit restricts
accessing user memory from kernel code.

Signed-off-by: Sumit Saxena <[email protected]>
Signed-off-by: Kashyap Desai <[email protected]>
Reviewed-by: Tomas Henzl <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/scsi/megaraid/megaraid_sas_base.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -5083,6 +5083,9 @@ static int megasas_mgmt_compat_ioctl_fw(
int i;
int error = 0;
compat_uptr_t ptr;
+ unsigned long local_raw_ptr;
+ u32 local_sense_off;
+ u32 local_sense_len;

if (clear_user(ioc, sizeof(*ioc)))
return -EFAULT;
@@ -5100,9 +5103,15 @@ static int megasas_mgmt_compat_ioctl_fw(
* sense_len is not null, so prepare the 64bit value under
* the same condition.
*/
- if (ioc->sense_len) {
+ if (get_user(local_raw_ptr, ioc->frame.raw) ||
+ get_user(local_sense_off, &ioc->sense_off) ||
+ get_user(local_sense_len, &ioc->sense_len))
+ return -EFAULT;
+
+
+ if (local_sense_len) {
void __user **sense_ioc_ptr =
- (void __user **)(ioc->frame.raw + ioc->sense_off);
+ (void __user **)((u8*)local_raw_ptr + local_sense_off);
compat_uptr_t *sense_cioc_ptr =
(compat_uptr_t *)(cioc->frame.raw + cioc->sense_off);
if (get_user(ptr, sense_cioc_ptr) ||

2015-11-24 22:41:47

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 26/52] crypto: algif_hash - Only export and import on sockets with data

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Herbert Xu <[email protected]>

commit 4afa5f9617927453ac04b24b584f6c718dfb4f45 upstream.

The hash_accept call fails to work on sockets that have not received
any data. For some algorithm implementations it may cause crashes.

This patch fixes this by ensuring that we only export and import on
sockets that have received data.

Reported-by: Harsh Jain <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Tested-by: Stephan Mueller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
crypto/algif_hash.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -192,9 +192,14 @@ static int hash_accept(struct socket *so
struct sock *sk2;
struct alg_sock *ask2;
struct hash_ctx *ctx2;
+ bool more;
int err;

- err = crypto_ahash_export(req, state);
+ lock_sock(sk);
+ more = ctx->more;
+ err = more ? crypto_ahash_export(req, state) : 0;
+ release_sock(sk);
+
if (err)
return err;

@@ -205,7 +210,10 @@ static int hash_accept(struct socket *so
sk2 = newsock->sk;
ask2 = alg_sk(sk2);
ctx2 = ask2->private;
- ctx2->more = 1;
+ ctx2->more = more;
+
+ if (!more)
+ return err;

err = crypto_ahash_import(&ctx2->req, state);
if (err) {

2015-11-24 22:41:52

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 25/52] mtd: blkdevs: fix potential deadlock + lockdep warnings

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Brian Norris <[email protected]>

commit f3c63795e90f0c6238306883b6c72f14d5355721 upstream.

Commit 073db4a51ee4 ("mtd: fix: avoid race condition when accessing
mtd->usecount") fixed a race condition but due to poor ordering of the
mutex acquisition, introduced a potential deadlock.

The deadlock can occur, for example, when rmmod'ing the m25p80 module, which
will delete one or more MTDs, along with any corresponding mtdblock
devices. This could potentially race with an acquisition of the block
device as follows.

-> blktrans_open()
-> mutex_lock(&dev->lock);
-> mutex_lock(&mtd_table_mutex);

-> del_mtd_device()
-> mutex_lock(&mtd_table_mutex);
-> blktrans_notify_remove() -> del_mtd_blktrans_dev()
-> mutex_lock(&dev->lock);

This is a classic (potential) ABBA deadlock, which can be fixed by
making the A->B ordering consistent everywhere. There was no real
purpose to the ordering in the original patch, AFAIR, so this shouldn't
be a problem. This ordering was actually already present in
del_mtd_blktrans_dev(), for one, where the function tried to ensure that
its caller already held mtd_table_mutex before it acquired &dev->lock:

if (mutex_trylock(&mtd_table_mutex)) {
mutex_unlock(&mtd_table_mutex);
BUG();
}

So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so
we always acquire mtd_table_mutex first.

Snippets of the lockdep output follow:

# modprobe -r m25p80
[ 53.419251]
[ 53.420838] ======================================================
[ 53.427300] [ INFO: possible circular locking dependency detected ]
[ 53.433865] 4.3.0-rc6 #96 Not tainted
[ 53.437686] -------------------------------------------------------
[ 53.444220] modprobe/372 is trying to acquire lock:
[ 53.449320] (&new->lock){+.+...}, at: [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
[ 53.457271]
[ 53.457271] but task is already holding lock:
[ 53.463372] (mtd_table_mutex){+.+.+.}, at: [<c0439994>] del_mtd_device+0x18/0x100
[ 53.471321]
[ 53.471321] which lock already depends on the new lock.
[ 53.471321]
[ 53.479856]
[ 53.479856] the existing dependency chain (in reverse order) is:
[ 53.487660]
-> #1 (mtd_table_mutex){+.+.+.}:
[ 53.492331] [<c043fc5c>] blktrans_open+0x34/0x1a4
[ 53.497879] [<c01afce0>] __blkdev_get+0xc4/0x3b0
[ 53.503364] [<c01b0bb8>] blkdev_get+0x108/0x320
[ 53.508743] [<c01713c0>] do_dentry_open+0x218/0x314
[ 53.514496] [<c0180454>] path_openat+0x4c0/0xf9c
[ 53.519959] [<c0182044>] do_filp_open+0x5c/0xc0
[ 53.525336] [<c0172758>] do_sys_open+0xfc/0x1cc
[ 53.530716] [<c000f740>] ret_fast_syscall+0x0/0x1c
[ 53.536375]
-> #0 (&new->lock){+.+...}:
[ 53.540587] [<c063f124>] mutex_lock_nested+0x38/0x3cc
[ 53.546504] [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
[ 53.552606] [<c043f164>] blktrans_notify_remove+0x7c/0x84
[ 53.558891] [<c04399f0>] del_mtd_device+0x74/0x100
[ 53.564544] [<c043c670>] del_mtd_partitions+0x80/0xc8
[ 53.570451] [<c0439aa0>] mtd_device_unregister+0x24/0x48
[ 53.576637] [<c046ce6c>] spi_drv_remove+0x1c/0x34
[ 53.582207] [<c03de0f0>] __device_release_driver+0x88/0x114
[ 53.588663] [<c03de19c>] device_release_driver+0x20/0x2c
[ 53.594843] [<c03dd9e8>] bus_remove_device+0xd8/0x108
[ 53.600748] [<c03dacc0>] device_del+0x10c/0x210
[ 53.606127] [<c03dadd0>] device_unregister+0xc/0x20
[ 53.611849] [<c046d878>] __unregister+0x10/0x20
[ 53.617211] [<c03da868>] device_for_each_child+0x50/0x7c
[ 53.623387] [<c046eae8>] spi_unregister_master+0x58/0x8c
[ 53.629578] [<c03e12f0>] release_nodes+0x15c/0x1c8
[ 53.635223] [<c03de0f8>] __device_release_driver+0x90/0x114
[ 53.641689] [<c03de900>] driver_detach+0xb4/0xb8
[ 53.647147] [<c03ddc78>] bus_remove_driver+0x4c/0xa0
[ 53.652970] [<c00cab50>] SyS_delete_module+0x11c/0x1e4
[ 53.658976] [<c000f740>] ret_fast_syscall+0x0/0x1c
[ 53.664621]
[ 53.664621] other info that might help us debug this:
[ 53.664621]
[ 53.672979] Possible unsafe locking scenario:
[ 53.672979]
[ 53.679169] CPU0 CPU1
[ 53.683900] ---- ----
[ 53.688633] lock(mtd_table_mutex);
[ 53.692383] lock(&new->lock);
[ 53.698306] lock(mtd_table_mutex);
[ 53.704658] lock(&new->lock);
[ 53.707946]
[ 53.707946] *** DEADLOCK ***

Fixes: 073db4a51ee4 ("mtd: fix: avoid race condition when accessing mtd->usecount")
Reported-by: Felipe Balbi <[email protected]>
Tested-by: Felipe Balbi <[email protected]>
Signed-off-by: Brian Norris <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/mtd/mtd_blkdevs.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -213,8 +213,8 @@ static int blktrans_open(struct block_de
if (!dev)
return -ERESTARTSYS; /* FIXME: busy loop! -arnd*/

- mutex_lock(&dev->lock);
mutex_lock(&mtd_table_mutex);
+ mutex_lock(&dev->lock);

if (dev->open)
goto unlock;
@@ -237,8 +237,8 @@ static int blktrans_open(struct block_de

unlock:
dev->open++;
- mutex_unlock(&mtd_table_mutex);
mutex_unlock(&dev->lock);
+ mutex_unlock(&mtd_table_mutex);
blktrans_dev_put(dev);
return ret;

@@ -248,8 +248,8 @@ error_release:
error_put:
module_put(dev->tr->owner);
kref_put(&dev->ref, blktrans_dev_release);
- mutex_unlock(&mtd_table_mutex);
mutex_unlock(&dev->lock);
+ mutex_unlock(&mtd_table_mutex);
blktrans_dev_put(dev);
return ret;
}
@@ -262,8 +262,8 @@ static int blktrans_release(struct gendi
if (!dev)
return ret;

- mutex_lock(&dev->lock);
mutex_lock(&mtd_table_mutex);
+ mutex_lock(&dev->lock);

if (--dev->open)
goto unlock;
@@ -276,8 +276,8 @@ static int blktrans_release(struct gendi
__put_mtd_device(dev->mtd);
}
unlock:
- mutex_unlock(&mtd_table_mutex);
mutex_unlock(&dev->lock);
+ mutex_unlock(&mtd_table_mutex);
blktrans_dev_put(dev);
return ret;
}

2015-11-24 22:42:02

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 04/52] HID: core: Avoid uninitialized buffer access

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Richard Purdie <[email protected]>

commit 79b568b9d0c7c5d81932f4486d50b38efdd6da6d upstream.

hid_connect adds various strings to the buffer but they're all
conditional. You can find circumstances where nothing would be written
to it but the kernel will still print the supposedly empty buffer with
printk. This leads to corruption on the console/in the logs.

Ensure buf is initialized to an empty string.

Signed-off-by: Richard Purdie <[email protected]>
[dvhart: Initialize string to "" rather than assign buf[0] = NULL;]
Cc: Jiri Kosina <[email protected]>
Cc: [email protected]
Signed-off-by: Darren Hart <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/hid/hid-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -1295,7 +1295,7 @@ int hid_connect(struct hid_device *hdev,
"Multi-Axis Controller"
};
const char *type, *bus;
- char buf[64];
+ char buf[64] = "";
unsigned int i;
int len;
int ret;

2015-11-24 22:42:12

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 42/52] FS-Cache: Increase reference of parent after registering, netfs success

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Kinglong Mee <[email protected]>

commit 86108c2e34a26e4bec3c6ddb23390bf8cedcf391 upstream.

If netfs exist, fscache should not increase the reference of parent's
usage and n_children, otherwise, never be decreased.

v2: thanks David's suggest,
move increasing reference of parent if success
use kmem_cache_free() freeing primary_index directly

v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;"

Signed-off-by: Kinglong Mee <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/fscache/netfs.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

--- a/fs/fscache/netfs.c
+++ b/fs/fscache/netfs.c
@@ -45,9 +45,6 @@ int __fscache_register_netfs(struct fsca
netfs->primary_index->parent = &fscache_fsdef_index;
netfs->primary_index->netfs_data = netfs;

- atomic_inc(&netfs->primary_index->parent->usage);
- atomic_inc(&netfs->primary_index->parent->n_children);
-
spin_lock_init(&netfs->primary_index->lock);
INIT_HLIST_HEAD(&netfs->primary_index->backing_objects);

@@ -60,6 +57,9 @@ int __fscache_register_netfs(struct fsca
goto already_registered;
}

+ atomic_inc(&netfs->primary_index->parent->usage);
+ atomic_inc(&netfs->primary_index->parent->n_children);
+
list_add(&netfs->link, &fscache_netfs_list);
ret = 0;

@@ -70,8 +70,7 @@ already_registered:
up_write(&fscache_addremove_sem);

if (ret < 0) {
- netfs->primary_index->parent = NULL;
- __fscache_cookie_put(netfs->primary_index);
+ kmem_cache_free(fscache_cookie_jar, netfs->primary_index);
netfs->primary_index = NULL;
}

2015-11-24 22:42:15

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 38/52] Btrfs: fix race when listing an inode's xattrs

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Filipe Manana <[email protected]>

commit f1cd1f0b7d1b5d4aaa5711e8f4e4898b0045cb6d upstream.

When listing a inode's xattrs we have a time window where we race against
a concurrent operation for adding a new hard link for our inode that makes
us not return any xattr to user space. In order for this to happen, the
first xattr of our inode needs to be at slot 0 of a leaf and the previous
leaf must still have room for an inode ref (or extref) item, and this can
happen because an inode's listxattrs callback does not lock the inode's
i_mutex (nor does the VFS does it for us), but adding a hard link to an
inode makes the VFS lock the inode's i_mutex before calling the inode's
link callback.

If we have the following leafs:

Leaf X (has N items) Leaf Y

[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 XATTR_ITEM 12345), ... ]
slot N - 2 slot N - 1 slot 0

The race illustrated by the following sequence diagram is possible:

CPU 1 CPU 2

btrfs_listxattr()

searches for key (257 XATTR_ITEM 0)

gets path with path->nodes[0] == leaf X
and path->slots[0] == N

because path->slots[0] is >=
btrfs_header_nritems(leaf X), it calls
btrfs_next_leaf()

btrfs_next_leaf()
releases the path

adds key (257 INODE_REF 666)
to the end of leaf X (slot N),
and leaf X now has N + 1 items

searches for the key (257 INODE_REF 256),
with path->keep_locks == 1, because that
is the last key it saw in leaf X before
releasing the path

ends up at leaf X again and it verifies
that the key (257 INODE_REF 256) is no
longer the last key in leaf X, so it
returns with path->nodes[0] == leaf X
and path->slots[0] == N, pointing to
the new item with key (257 INODE_REF 666)

btrfs_listxattr's loop iteration sees that
the type of the key pointed by the path is
different from the type BTRFS_XATTR_ITEM_KEY
and so it breaks the loop and stops looking
for more xattr items
--> the application doesn't get any xattr
listed for our inode

So fix this by breaking the loop only if the key's type is greater than
BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.

Signed-off-by: Filipe Manana <[email protected]>
[bwh: Backported to 3.2: s/found_key\.type/btrfs_key_type(\&found_key)/]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/btrfs/xattr.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -259,8 +259,10 @@ ssize_t btrfs_listxattr(struct dentry *d
/* check to make sure this item is what we want */
if (found_key.objectid != key.objectid)
break;
- if (btrfs_key_type(&found_key) != BTRFS_XATTR_ITEM_KEY)
+ if (btrfs_key_type(&found_key) > BTRFS_XATTR_ITEM_KEY)
break;
+ if (btrfs_key_type(&found_key) < BTRFS_XATTR_ITEM_KEY)
+ goto next;

di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
if (verify_dir_item(root, leaf, di))

2015-11-24 22:42:24

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 41/52] KVM: svm: unconditionally intercept #DB

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <[email protected]>

commit cbdb967af3d54993f5814f1cee0ed311a055377d upstream.

This is needed to avoid the possibility that the guest triggers
an infinite stream of #DB exceptions (CVE-2015-8104).

VMX is not affected: because it does not save DR6 in the VMCS,
it already intercepts #DB unconditionally.

Reported-by: Jan Beulich <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
[bwh: Backported to 3.2: #DB and #BP did not share a function, and there is
no operation pointer referring to it, so remove update_db_intercept()
entirely]
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/x86/kvm/svm.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)

--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1015,6 +1015,7 @@ static void init_vmcb(struct vcpu_svm *s
set_exception_intercept(svm, UD_VECTOR);
set_exception_intercept(svm, MC_VECTOR);
set_exception_intercept(svm, AC_VECTOR);
+ set_exception_intercept(svm, DB_VECTOR);

set_intercept(svm, INTERCEPT_INTR);
set_intercept(svm, INTERCEPT_NMI);
@@ -1550,26 +1551,6 @@ static void svm_set_segment(struct kvm_v
mark_dirty(svm->vmcb, VMCB_SEG);
}

-static void update_db_intercept(struct kvm_vcpu *vcpu)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
-
- clr_exception_intercept(svm, DB_VECTOR);
- clr_exception_intercept(svm, BP_VECTOR);
-
- if (svm->nmi_singlestep)
- set_exception_intercept(svm, DB_VECTOR);
-
- if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
- if (vcpu->guest_debug &
- (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
- set_exception_intercept(svm, DB_VECTOR);
- if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
- set_exception_intercept(svm, BP_VECTOR);
- } else
- vcpu->guest_debug = 0;
-}
-
static void svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -1580,8 +1561,6 @@ static void svm_guest_debug(struct kvm_v
svm->vmcb->save.dr7 = vcpu->arch.dr7;

mark_dirty(svm->vmcb, VMCB_DR);
-
- update_db_intercept(vcpu);
}

static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
@@ -1655,7 +1634,6 @@ static int db_interception(struct vcpu_s
if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
svm->vmcb->save.rflags &=
~(X86_EFLAGS_TF | X86_EFLAGS_RF);
- update_db_intercept(&svm->vcpu);
}

if (svm->vcpu.guest_debug &
@@ -3557,7 +3535,6 @@ static void enable_nmi_window(struct kvm
*/
svm->nmi_singlestep = true;
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
- update_db_intercept(vcpu);
}

static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)

2015-11-24 22:42:19

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 50/52] ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context.

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Ani Sinha <[email protected]>

[ Upstream commit 44f49dd8b5a606870a1f21101522a0f9c4414784 ]

Fixes the following kernel BUG :

BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758
caller is __this_cpu_preempt_check+0x13/0x15
CPU: 0 PID: 2758 Comm: bash Tainted: P O 3.18.19 #2
ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000
0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800
ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8
Call Trace:
[<ffffffff81482b2a>] dump_stack+0x52/0x80
[<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1
[<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15
[<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c
[<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e
[<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c
[<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
[<ffffffff810e6974>] ? pollwake+0x4d/0x51
[<ffffffff81058ac0>] ? default_wake_function+0x0/0xf
[<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
[<ffffffff810613d9>] ? __wake_up_common+0x45/0x77
[<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32
[<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53
[<ffffffff8139a519>] ? sock_def_readable+0x71/0x75
[<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55
[<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41
[<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86
[<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d
[<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b
[<ffffffff810d5738>] ? new_sync_read+0x82/0xaa
[<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99
[<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32
[<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f
[<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf
[<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3
[<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e

Signed-off-by: Ani Sinha <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[bwh: Backported to 3.2: ipmr doesn't implement IPSTATS_MIB_OUTOCTETS]
Signed-off-by: Ben Hutchings <[email protected]>
---
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1590,7 +1590,7 @@ static inline int ipmr_forward_finish(st
{
struct ip_options *opt = &(IPCB(skb)->opt);

- IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS);
+ IP_INC_STATS(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS);

if (unlikely(opt->optlen))
ip_forward_options(skb);
@@ -1652,7 +1652,7 @@ static void ipmr_queue_xmit(struct net *
* to blackhole.
*/

- IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
+ IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
ip_rt_put(rt);
goto out_free;
}

2015-11-24 22:42:08

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 40/52] net: fix a race in dst_release()

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

commit d69bbf88c8d0b367cf3e3a052f6daadf630ee566 upstream.

Only cpu seeing dst refcount going to 0 can safely
dereference dst->flags.

Otherwise an other cpu might already have freed the dst.

Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst")
Reported-by: Greg Thelen <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
net/core/dst.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -272,7 +272,7 @@ void dst_release(struct dst_entry *dst)

newrefcnt = atomic_dec_return(&dst->__refcnt);
WARN_ON(newrefcnt < 0);
- if (unlikely(dst->flags & DST_NOCACHE) && !newrefcnt) {
+ if (!newrefcnt && unlikely(dst->flags & DST_NOCACHE)) {
dst = dst_destroy(dst);
if (dst)
__dst_free(dst);

2015-11-24 22:42:06

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 23/52] megaraid_sas: Do not use PAGE_SIZE for max_sectors

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: "[email protected]" <[email protected]>

commit 357ae967ad66e357f78b5cfb5ab6ca07fb4a7758 upstream.

Do not use PAGE_SIZE marco to calculate max_sectors per I/O
request. Driver code assumes PAGE_SIZE will be always 4096 which can
lead to wrongly calculated value if PAGE_SIZE is not 4096. This issue
was reported in Ubuntu Bugzilla Bug #1475166.

Signed-off-by: Sumit Saxena <[email protected]>
Signed-off-by: Kashyap Desai <[email protected]>
Reviewed-by: Tomas Henzl <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/scsi/megaraid/megaraid_sas.h | 2 ++
drivers/scsi/megaraid/megaraid_sas_base.c | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/scsi/megaraid/megaraid_sas.h
+++ b/drivers/scsi/megaraid/megaraid_sas.h
@@ -300,6 +300,8 @@ enum MR_EVT_ARGS {
MR_EVT_ARGS_GENERIC,
};

+
+#define SGE_BUFFER_SIZE 4096
/*
* define constants for device list query options
*/
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -3636,7 +3636,7 @@ static int megasas_init_fw(struct megasa
}

instance->max_sectors_per_req = instance->max_num_sge *
- PAGE_SIZE / 512;
+ SGE_BUFFER_SIZE / 512;
if (tmp_sectors && (instance->max_sectors_per_req > tmp_sectors))
instance->max_sectors_per_req = tmp_sectors;

2015-11-24 22:41:56

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 33/52] x86/cpu: Call verify_cpu() after having entered long mode too

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <[email protected]>

commit 04633df0c43d710e5f696b06539c100898678235 upstream.

When we get loaded by a 64-bit bootloader, kernel entry point is
startup_64 in head_64.S. We don't trust any and all bootloaders because
some will fiddle with CPU configuration so we go ahead and massage each
CPU into sanity again.

For example, some dell BIOSes have this XD disable feature which set
IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
for other OSes but Linux sure doesn't need it.

A similar thing is present in the Surface 3 firmware - see
https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
only on the BSP:

# rdmsr -a 0x1a0
400850089
850089
850089
850089

I know, right?!

There's not even an off switch in there.

So fix all those cases by sanitizing the 64-bit entry point too. For
that, make verify_cpu() callable in 64-bit mode also.

Requested-and-debugged-by: "H. Peter Anvin" <[email protected]>
Reported-and-tested-by: Bastien Nocera <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
arch/x86/kernel/head_64.S | 8 ++++++++
arch/x86/kernel/verify_cpu.S | 12 +++++++-----
2 files changed, 15 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -63,6 +63,9 @@ startup_64:
* tables and then reload them.
*/

+ /* Sanitize CPU configuration */
+ call verify_cpu
+
/* Compute the delta between the address I am compiled to run at and the
* address I am actually running at.
*/
@@ -160,6 +163,9 @@ ENTRY(secondary_startup_64)
* after the boot processor executes this code.
*/

+ /* Sanitize CPU configuration */
+ call verify_cpu
+
/* Enable PAE mode and PGE */
movl $(X86_CR4_PAE | X86_CR4_PGE), %eax
movq %rax, %cr4
@@ -253,6 +259,8 @@ ENTRY(secondary_startup_64)
pushq %rax # target address in negative space
lretq

+#include "verify_cpu.S"
+
/* SMP bootup changes these two */
__REFDATA
.align 8
--- a/arch/x86/kernel/verify_cpu.S
+++ b/arch/x86/kernel/verify_cpu.S
@@ -34,10 +34,11 @@
#include <asm/msr-index.h>

verify_cpu:
- pushfl # Save caller passed flags
- pushl $0 # Kill any dangerous flags
- popfl
+ pushf # Save caller passed flags
+ push $0 # Kill any dangerous flags
+ popf

+#ifndef __x86_64__
pushfl # standard way to check for cpuid
popl %eax
movl %eax,%ebx
@@ -48,6 +49,7 @@ verify_cpu:
popl %eax
cmpl %eax,%ebx
jz verify_cpu_no_longmode # cpu has no cpuid
+#endif

movl $0x0,%eax # See if cpuid 1 is implemented
cpuid
@@ -130,10 +132,10 @@ verify_cpu_sse_test:
jmp verify_cpu_sse_test # try again

verify_cpu_no_longmode:
- popfl # Restore caller passed flags
+ popf # Restore caller passed flags
movl $1,%eax
ret
verify_cpu_sse_ok:
- popfl # Restore caller passed flags
+ popf # Restore caller passed flags
xorl %eax, %eax
ret

2015-11-24 22:41:31

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 36/52] perf: Fix inherited events vs. tracepoint filters

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <[email protected]>

commit b71b437eedaed985062492565d9d421d975ae845 upstream.

Arnaldo reported that tracepoint filters seem to misbehave (ie. not
apply) on inherited events.

The fix is obvious; filters are only set on the actual (parent)
event, use the normal pattern of using this parent event for filters.
This is safe because each child event has a reference to it.

Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frédéric Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
kernel/events/core.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5393,6 +5393,10 @@ static int perf_tp_filter_match(struct p
{
void *record = data->raw->data;

+ /* only top level events have filters set */
+ if (event->parent)
+ event = event->parent;
+
if (likely(!event->filter) || filter_match_preds(event->filter, record))
return 1;
return 0;

2015-11-24 22:41:28

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 24/52] can: Use correct type in sizeof() in nla_put()

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Marek Vasut <[email protected]>

commit 562b103a21974c2f9cd67514d110f918bb3e1796 upstream.

The sizeof() is invoked on an incorrect variable, likely due to some
copy-paste error, and this might result in memory corruption. Fix this.

Signed-off-by: Marek Vasut <[email protected]>
Cc: Wolfgang Grandegger <[email protected]>
Cc: [email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
[bwh: Backported to 3.2:
- Keep using the old NLA_PUT macro
- Adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/net/can/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -693,7 +693,7 @@ static int can_fill_info(struct sk_buff
NLA_PUT_U32(skb, IFLA_CAN_RESTART_MS, priv->restart_ms);
NLA_PUT(skb, IFLA_CAN_BITTIMING,
sizeof(priv->bittiming), &priv->bittiming);
- NLA_PUT(skb, IFLA_CAN_CLOCK, sizeof(cm), &priv->clock);
+ NLA_PUT(skb, IFLA_CAN_CLOCK, sizeof(priv->clock), &priv->clock);
if (priv->do_get_berr_counter && !priv->do_get_berr_counter(dev, &bec))
NLA_PUT(skb, IFLA_CAN_BERR_COUNTER, sizeof(bec), &bec);
if (priv->bittiming_const)

2015-11-24 22:46:57

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 16/52] ext4, jbd2: ensure entering into panic after recording an error in superblock

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Daeho Jeong <[email protected]>

commit 4327ba52afd03fc4b5afa0ee1d774c9c5b0e85c5 upstream.

If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
journaling will be aborted first and the error number will be recorded
into JBD2 superblock and, finally, the system will enter into the
panic state in "errors=panic" option. But, in the rare case, this
sequence is little twisted like the below figure and it will happen
that the system enters into panic state, which means the system reset
in mobile environment, before completion of recording an error in the
journal superblock. In this case, e2fsck cannot recognize that the
filesystem failure occurred in the previous run and the corruption
wouldn't be fixed.

Task A Task B
ext4_handle_error()
-> jbd2_journal_abort()
-> __journal_abort_soft()
-> __jbd2_journal_abort_hard()
| -> journal->j_flags |= JBD2_ABORT;
|
| __ext4_abort()
| -> jbd2_journal_abort()
| | -> __journal_abort_soft()
| | -> if (journal->j_flags & JBD2_ABORT)
| | return;
| -> panic()
|
-> jbd2_journal_update_sb_errno()

Tested-by: Hobin Woo <[email protected]>
Signed-off-by: Daeho Jeong <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/ext4/super.c | 12 ++++++++++--
fs/jbd2/journal.c | 6 +++++-
include/linux/jbd2.h | 1 +
3 files changed, 16 insertions(+), 3 deletions(-)

--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -463,9 +463,13 @@ static void ext4_handle_error(struct sup
ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
sb->s_flags |= MS_RDONLY;
}
- if (test_opt(sb, ERRORS_PANIC))
+ if (test_opt(sb, ERRORS_PANIC)) {
+ if (EXT4_SB(sb)->s_journal &&
+ !(EXT4_SB(sb)->s_journal->j_flags & JBD2_REC_ERR))
+ return;
panic("EXT4-fs (device %s): panic forced after error\n",
sb->s_id);
+ }
}

void __ext4_error(struct super_block *sb, const char *function,
@@ -628,8 +632,12 @@ void __ext4_abort(struct super_block *sb
jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO);
save_error_info(sb, function, line);
}
- if (test_opt(sb, ERRORS_PANIC))
+ if (test_opt(sb, ERRORS_PANIC)) {
+ if (EXT4_SB(sb)->s_journal &&
+ !(EXT4_SB(sb)->s_journal->j_flags & JBD2_REC_ERR))
+ return;
panic("EXT4-fs panic from previous error\n");
+ }
}

void ext4_msg(struct super_block *sb, const char *prefix, const char *fmt, ...)
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1956,8 +1956,12 @@ static void __journal_abort_soft (journa

__jbd2_journal_abort_hard(journal);

- if (errno)
+ if (errno) {
jbd2_journal_update_sb_errno(journal);
+ write_lock(&journal->j_state_lock);
+ journal->j_flags |= JBD2_REC_ERR;
+ write_unlock(&journal->j_state_lock);
+ }
}

/**
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -954,6 +954,7 @@ struct journal_s
#define JBD2_ABORT_ON_SYNCDATA_ERR 0x040 /* Abort the journal on file
* data write error in ordered
* mode */
+#define JBD2_REC_ERR 0x080 /* The errno in the sb has been recorded */

/*
* Function declarations for the journaling transaction and buffer

2015-11-24 22:47:02

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 14/52] Btrfs: don't use ram_bytes for uncompressed inline items

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Chris Mason <[email protected]>

commit 514ac8ad8793a097c0c9d89202c642479d6dfa34 upstream.

If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
the new size. The fixe uses the size directly from the item header when
reading uncompressed inlines, and also fixes truncate to update the
size as it goes.

Reported-by: Jens Axboe <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
[bwh: Backported to 3.2:
- Don't use btrfs_map_token API
- There are fewer callers of btrfs_file_extent_inline_len() to change
- Adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2174,15 +2174,6 @@ BTRFS_SETGET_FUNCS(file_extent_encryptio
BTRFS_SETGET_FUNCS(file_extent_other_encoding, struct btrfs_file_extent_item,
other_encoding, 16);

-/* this returns the number of file bytes represented by the inline item.
- * If an item is compressed, this is the uncompressed size
- */
-static inline u32 btrfs_file_extent_inline_len(struct extent_buffer *eb,
- struct btrfs_file_extent_item *e)
-{
- return btrfs_file_extent_ram_bytes(eb, e);
-}
-
/*
* this returns the number of bytes used by the item on disk, minus the
* size of any extent headers. If a file is compressed on disk, this is
@@ -2196,6 +2187,29 @@ static inline u32 btrfs_file_extent_inli
return btrfs_item_size(eb, e) - offset;
}

+/* this returns the number of file bytes represented by the inline item.
+ * If an item is compressed, this is the uncompressed size
+ */
+static inline u32 btrfs_file_extent_inline_len(struct extent_buffer *eb,
+ int slot,
+ struct btrfs_file_extent_item *fi)
+{
+ /*
+ * return the space used on disk if this item isn't
+ * compressed or encoded
+ */
+ if (btrfs_file_extent_compression(eb, fi) == 0 &&
+ btrfs_file_extent_encryption(eb, fi) == 0 &&
+ btrfs_file_extent_other_encoding(eb, fi) == 0) {
+ return btrfs_file_extent_inline_item_len(eb,
+ btrfs_item_nr(eb, slot));
+ }
+
+ /* otherwise use the ram bytes field */
+ return btrfs_file_extent_ram_bytes(eb, fi);
+}
+
+
static inline struct btrfs_root *btrfs_sb(struct super_block *sb)
{
return sb->s_fs_info;
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -622,7 +622,8 @@ next_slot:
btrfs_file_extent_num_bytes(leaf, fi);
} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
extent_end = key.offset +
- btrfs_file_extent_inline_len(leaf, fi);
+ btrfs_file_extent_inline_len(leaf,
+ path->slots[0], fi);
} else {
WARN_ON(1);
extent_end = search_start;
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1178,7 +1178,8 @@ next_slot:
nocow = 1;
} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
extent_end = found_key.offset +
- btrfs_file_extent_inline_len(leaf, fi);
+ btrfs_file_extent_inline_len(leaf,
+ path->slots[0], fi);
extent_end = ALIGN(extent_end, root->sectorsize);
} else {
BUG_ON(1);
@@ -3095,7 +3096,7 @@ search_again:
btrfs_file_extent_num_bytes(leaf, fi);
} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
item_end += btrfs_file_extent_inline_len(leaf,
- fi);
+ path->slots[0], fi);
}
item_end--;
}
@@ -3161,6 +3162,12 @@ search_again:
inode_sub_bytes(inode, item_end + 1 -
new_size);
}
+
+ /*
+ * update the ram bytes to properly reflect
+ * the new size of our item
+ */
+ btrfs_set_file_extent_ram_bytes(leaf, fi, size);
size =
btrfs_file_extent_calc_inline_size(size);
ret = btrfs_truncate_item(trans, root, path,
@@ -5036,7 +5043,7 @@ again:
btrfs_file_extent_num_bytes(leaf, item);
} else if (found_type == BTRFS_FILE_EXTENT_INLINE) {
size_t size;
- size = btrfs_file_extent_inline_len(leaf, item);
+ size = btrfs_file_extent_inline_len(leaf, path->slots[0], item);
extent_end = (extent_start + size + root->sectorsize - 1) &
~((u64)root->sectorsize - 1);
}
@@ -5103,7 +5110,7 @@ again:
goto out;
}

- size = btrfs_file_extent_inline_len(leaf, item);
+ size = btrfs_file_extent_inline_len(leaf, path->slots[0], item);
extent_offset = page_offset(page) + pg_offset - extent_start;
copy_size = min_t(u64, PAGE_CACHE_SIZE - pg_offset,
size - extent_offset);
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -240,7 +240,7 @@ void btrfs_print_leaf(struct btrfs_root
BTRFS_FILE_EXTENT_INLINE) {
printk(KERN_INFO "\t\tinline extent data "
"size %u\n",
- btrfs_file_extent_inline_len(l, fi));
+ btrfs_file_extent_inline_len(l, i, fi));
break;
}
printk(KERN_INFO "\t\textent data disk bytenr %llu "
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -537,7 +537,7 @@ static noinline int replay_one_extent(st
if (btrfs_file_extent_disk_bytenr(eb, item) == 0)
nbytes = 0;
} else if (found_type == BTRFS_FILE_EXTENT_INLINE) {
- size = btrfs_file_extent_inline_len(eb, item);
+ size = btrfs_file_extent_inline_len(eb, slot, item);
nbytes = btrfs_file_extent_ram_bytes(eb, item);
extent_end = (start + size + mask) & ~mask;
} else {

2015-11-24 22:47:09

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 07/52] mtd: mtdpart: fix add_mtd_partitions error path

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Boris BREZILLON <[email protected]>

commit e5bae86797141e4a95e42d825f737cb36d7b8c37 upstream.

If we fail to allocate a partition structure in the middle of the partition
creation process, the already allocated partitions are never removed, which
means they are still present in the partition list and their resources are
never freed.

Signed-off-by: Boris Brezillon <[email protected]>
Signed-off-by: Brian Norris <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/mtd/mtdpart.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/mtd/mtdpart.c
+++ b/drivers/mtd/mtdpart.c
@@ -671,8 +671,10 @@ int add_mtd_partitions(struct mtd_info *

for (i = 0; i < nbparts; i++) {
slave = allocate_partition(master, parts + i, i, cur_offset);
- if (IS_ERR(slave))
+ if (IS_ERR(slave)) {
+ del_mtd_partitions(master);
return PTR_ERR(slave);
+ }

mutex_lock(&mtd_partitions_mutex);
list_add(&slave->list, &mtd_partitions);

2015-11-24 22:47:24

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 52/52] splice: sendfile() at once fails for big files

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Christophe Leroy <[email protected]>

commit 0ff28d9f4674d781e492bcff6f32f0fe48cf0fed upstream.

Using sendfile with below small program to get MD5 sums of some files,
it appear that big files (over 64kbytes with 4k pages system) get a
wrong MD5 sum while small files get the correct sum.
This program uses sendfile() to send a file to an AF_ALG socket
for hashing.

/* md5sum2.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <linux/if_alg.h>

int main(int argc, char **argv)
{
int sk = socket(AF_ALG, SOCK_SEQPACKET, 0);
struct stat st;
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "hash",
.salg_name = "md5",
};
int n;

bind(sk, (struct sockaddr*)&sa, sizeof(sa));

for (n = 1; n < argc; n++) {
int size;
int offset = 0;
char buf[4096];
int fd;
int sko;
int i;

fd = open(argv[n], O_RDONLY);
sko = accept(sk, NULL, 0);
fstat(fd, &st);
size = st.st_size;
sendfile(sko, fd, &offset, size);
size = read(sko, buf, sizeof(buf));
for (i = 0; i < size; i++)
printf("%2.2x", buf[i]);
printf(" %s\n", argv[n]);
close(fd);
close(sko);
}
exit(0);
}

Test below is done using official linux patch files. First result is
with a software based md5sum. Second result is with the program above.

root@vgoip:~# ls -l patch-3.6.*
-rw-r--r-- 1 root root 64011 Aug 24 12:01 patch-3.6.2.gz
-rw-r--r-- 1 root root 94131 Aug 24 12:01 patch-3.6.3.gz

root@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz

root@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
5fd77b24e68bb24dcc72d6e57c64790e patch-3.6.3.gz

After investivation, it appears that sendfile() sends the files by blocks
of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each
block, the SPLICE_F_MORE flag is missing, therefore the hashing operation
is reset as if it was the end of the file.

This patch adds SPLICE_F_MORE to the flags when more data is pending.

With the patch applied, we get the correct sums:

root@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz

root@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz

Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/splice.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1187,7 +1187,7 @@ ssize_t splice_direct_to_actor(struct fi
long ret, bytes;
umode_t i_mode;
size_t len;
- int i, flags;
+ int i, flags, more;

/*
* We require the input being a regular file, as we don't want to
@@ -1230,6 +1230,7 @@ ssize_t splice_direct_to_actor(struct fi
* Don't block on output, we have to drain the direct pipe.
*/
sd->flags &= ~SPLICE_F_NONBLOCK;
+ more = sd->flags & SPLICE_F_MORE;

while (len) {
size_t read_len;
@@ -1243,6 +1244,15 @@ ssize_t splice_direct_to_actor(struct fi
sd->total_len = read_len;

/*
+ * If more data is pending, set SPLICE_F_MORE
+ * If this is the last data and SPLICE_F_MORE was not set
+ * initially, clears it.
+ */
+ if (read_len < len)
+ sd->flags |= SPLICE_F_MORE;
+ else if (!more)
+ sd->flags &= ~SPLICE_F_MORE;
+ /*
* NOTE: nonblocking mode only applies to the input. We
* must not do the output in nonblocking mode as then we
* could get stuck data in the internal pipe:

2015-11-24 22:47:30

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 10/52] Btrfs: added helper btrfs_next_item()

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Jan Schmidt <[email protected]>

commit c7d22a3c3cdb73d8a0151e2ccc8cf4a48c48310b upstream.

btrfs_next_item() makes the btrfs path point to the next item, crossing leaf
boundaries if needed.

Signed-off-by: Arne Jansen <[email protected]>
Signed-off-by: Jan Schmidt <[email protected]>
[bwh: Dependency of the following fix]
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/btrfs/ctree.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 50634abe..3e4a07b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2482,6 +2482,13 @@ static inline int btrfs_insert_empty_item(struct btrfs_trans_handle *trans,
}

int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path);
+static inline int btrfs_next_item(struct btrfs_root *root, struct btrfs_path *p)
+{
+ ++p->slots[0];
+ if (p->slots[0] >= btrfs_header_nritems(p->nodes[0]))
+ return btrfs_next_leaf(root, p);
+ return 0;
+}
int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path);
int btrfs_leaf_free_space(struct btrfs_root *root, struct extent_buffer *leaf);
void btrfs_drop_snapshot(struct btrfs_root *root,

2015-11-24 22:47:33

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 02/52] PCI: Use function 0 VPD for identical functions, regular VPD for others

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Alex Williamson <[email protected]>

commit da2d03ea27f6ed9d2005a67b20dd021ddacf1e4d upstream.

932c435caba8 ("PCI: Add dev_flags bit to access VPD through function 0")
added PCI_DEV_FLAGS_VPD_REF_F0. Previously, we set the flag on every
non-zero function of quirked devices. If a function turned out to be
different from function 0, i.e., it had a different class, vendor ID, or
device ID, the flag remained set but we didn't make VPD accessible at all.

Flip this around so we only set PCI_DEV_FLAGS_VPD_REF_F0 for functions that
are identical to function 0, and allow regular VPD access for any other
functions.

[bhelgaas: changelog, stable tag]
Fixes: 932c435caba8 ("PCI: Add dev_flags bit to access VPD through function 0")
Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Myron Stowe <[email protected]>
Acked-by: Mark Rustad <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/pci/access.c | 22 ----------------------
drivers/pci/quirks.c | 20 ++++++++++++++++++--
2 files changed, 18 insertions(+), 24 deletions(-)

--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -391,23 +391,6 @@ static const struct pci_vpd_ops pci_vpd_
.release = pci_vpd_pci22_release,
};

-static int pci_vpd_f0_dev_check(struct pci_dev *dev)
-{
- struct pci_dev *tdev = pci_get_slot(dev->bus,
- PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
- int ret = 0;
-
- if (!tdev)
- return -ENODEV;
- if (!tdev->vpd || !tdev->multifunction ||
- dev->class != tdev->class || dev->vendor != tdev->vendor ||
- dev->device != tdev->device)
- ret = -ENODEV;
-
- pci_dev_put(tdev);
- return ret;
-}
-
int pci_vpd_pci22_init(struct pci_dev *dev)
{
struct pci_vpd_pci22 *vpd;
@@ -416,12 +399,7 @@ int pci_vpd_pci22_init(struct pci_dev *d
cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
if (!cap)
return -ENODEV;
- if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
- int ret = pci_vpd_f0_dev_check(dev);

- if (ret)
- return ret;
- }
vpd = kzalloc(sizeof(*vpd), GFP_ATOMIC);
if (!vpd)
return -ENOMEM;
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1941,12 +1941,28 @@ static void __devinit quirk_netmos(struc
}
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID, quirk_netmos);

+/*
+ * Quirk non-zero PCI functions to route VPD access through function 0 for
+ * devices that share VPD resources between functions. The functions are
+ * expected to be identical devices.
+ */
static void quirk_f0_vpd_link(struct pci_dev *dev)
{
+ struct pci_dev *f0;
+
if ((dev->class >> 8) != PCI_CLASS_NETWORK_ETHERNET ||
- !dev->multifunction || !PCI_FUNC(dev->devfn))
+ !PCI_FUNC(dev->devfn))
+ return;
+
+ f0 = pci_get_slot(dev->bus, PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
+ if (!f0)
return;
- dev->dev_flags |= PCI_DEV_FLAGS_VPD_REF_F0;
+
+ if (f0->vpd && dev->class == f0->class &&
+ dev->vendor == f0->vendor && dev->device == f0->device)
+ dev->dev_flags |= PCI_DEV_FLAGS_VPD_REF_F0;
+
+ pci_dev_put(f0);
}
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, quirk_f0_vpd_link);

2015-11-24 22:47:22

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 51/52] net: avoid NULL deref in inet_ctl_sock_destroy()

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

[ Upstream commit 8fa677d2706d325d71dab91bf6e6512c05214e37 ]

Under low memory conditions, tcp_sk_init() and icmp_sk_init()
can both iterate on all possible cpus and call inet_ctl_sock_destroy(),
with eventual NULL pointer.

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
include/net/inet_common.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -38,7 +38,8 @@ extern int inet_ctl_sock_create(struct s

static inline void inet_ctl_sock_destroy(struct sock *sk)
{
- sk_release_kernel(sk);
+ if (sk)
+ sk_release_kernel(sk);
}

#endif

2015-11-24 22:47:17

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 18/52] Bluetooth: ath3k: Add support of AR3012 0cf3:817b device

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Dmitry Tunin <[email protected]>

commit 18e0afab8ce3f1230ce3fef52b2e73374fd9c0e7 upstream.

T: Bus=04 Lev=02 Prnt=02 Port=04 Cnt=01 Dev#= 3 Spd=12 MxCh= 0
D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0cf3 ProdID=817b Rev=00.02
C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb

BugLink: https://bugs.launchpad.net/bugs/1506615

Signed-off-by: Dmitry Tunin <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/bluetooth/ath3k.c | 2 ++
drivers/bluetooth/btusb.c | 1 +
2 files changed, 3 insertions(+)

--- a/drivers/bluetooth/ath3k.c
+++ b/drivers/bluetooth/ath3k.c
@@ -102,6 +102,7 @@ static struct usb_device_id ath3k_table[
{ USB_DEVICE(0x0CF3, 0x311F) },
{ USB_DEVICE(0x0cf3, 0x3121) },
{ USB_DEVICE(0x0CF3, 0x817a) },
+ { USB_DEVICE(0x0CF3, 0x817b) },
{ USB_DEVICE(0x0cf3, 0xe003) },
{ USB_DEVICE(0x0CF3, 0xE004) },
{ USB_DEVICE(0x0CF3, 0xE005) },
@@ -161,6 +162,7 @@ static struct usb_device_id ath3k_blist_
{ USB_DEVICE(0x0cf3, 0x311F), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0cf3, 0x3121), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0CF3, 0x817a), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0CF3, 0x817b), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0cf3, 0xe004), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0cf3, 0xe005), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0cf3, 0xe003), .driver_info = BTUSB_ATH3012 },
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -187,6 +187,7 @@ static struct usb_device_id blacklist_ta
{ USB_DEVICE(0x0cf3, 0x311f), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0cf3, 0x3121), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0cf3, 0x817a), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0cf3, 0x817b), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0cf3, 0xe003), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0cf3, 0xe004), .driver_info = BTUSB_ATH3012 },
{ USB_DEVICE(0x0cf3, 0xe005), .driver_info = BTUSB_ATH3012 },

2015-11-24 22:47:14

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 09/52] packet: fix match_fanout_group()

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

commit 161642e24fee40fba2c5bc2ceacc00d118a22d65 upstream.

Recent TCP listener patches exposed a prior af_packet bug :
match_fanout_group() blindly assumes it is always safe
to cast sk to a packet socket to compare fanout with af_packet_priv

But SYNACK packets can be sent while attached to request_sock, which
are smaller than a "struct sock".

We can read non existent memory and crash.

Fixes: c0de08d04215 ("af_packet: don't emit packet on orig fanout group")
Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: Eric Leblond <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/packet/af_packet.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1263,10 +1263,10 @@ static void __fanout_unlink(struct sock

bool match_fanout_group(struct packet_type *ptype, struct sock * sk)
{
- if (ptype->af_packet_priv == (void*)((struct packet_sock *)sk)->fanout)
- return true;
+ if (sk->sk_family != PF_PACKET)
+ return false;

- return false;
+ return ptype->af_packet_priv == pkt_sk(sk)->fanout;
}

static int fanout_add(struct sock *sk, u16 id, u16 type_flags)

2015-11-24 22:47:07

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 35/52] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Filipe Manana <[email protected]>

commit 1d512cb77bdbda80f0dd0620a3b260d697fd581d upstream.

If we are using the NO_HOLES feature, we have a tiny time window when
running delalloc for a nodatacow inode where we can race with a concurrent
link or xattr add operation leading to a BUG_ON.

This happens because at run_delalloc_nocow() we end up casting a leaf item
of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a
file extent item (struct btrfs_file_extent_item) and then analyse its
extent type field, which won't match any of the expected extent types
(values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an
explicit BUG_ON(1).

The following sequence diagram shows how the race happens when running a
no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following
neighbour leafs:

Leaf X (has N items) Leaf Y

[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 EXTENT_DATA 8192), ... ]
slot N - 2 slot N - 1 slot 0

(Note the implicit hole for inode 257 regarding the [0, 8K[ range)

CPU 1 CPU 2

run_dealloc_nocow()
btrfs_lookup_file_extent()
--> searches for a key with value
(257 EXTENT_DATA 4096) in the
fs/subvol tree
--> returns us a path with
path->nodes[0] == leaf X and
path->slots[0] == N

because path->slots[0] is >=
btrfs_header_nritems(leaf X), it
calls btrfs_next_leaf()

btrfs_next_leaf()
--> releases the path

hard link added to our inode,
with key (257 INODE_REF 500)
added to the end of leaf X,
so leaf X now has N + 1 keys

--> searches for the key
(257 INODE_REF 256), because
it was the last key in leaf X
before it released the path,
with path->keep_locks set to 1

--> ends up at leaf X again and
it verifies that the key
(257 INODE_REF 256) is no longer
the last key in the leaf, so it
returns with path->nodes[0] ==
leaf X and path->slots[0] == N,
pointing to the new item with
key (257 INODE_REF 500)

the loop iteration of run_dealloc_nocow()
does not break out the loop and continues
because the key referenced in the path
at path->nodes[0] and path->slots[0] is
for inode 257, its type is < BTRFS_EXTENT_DATA_KEY
and its offset (500) is less then our delalloc
range's end (8192)

the item pointed by the path, an inode reference item,
is (incorrectly) interpreted as a file extent item and
we get an invalid extent type, leading to the BUG_ON(1):

if (extent_type == BTRFS_FILE_EXTENT_REG ||
extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
(...)
} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
(...)
} else {
BUG_ON(1)
}

The same can happen if a xattr is added concurrently and ends up having
a key with an offset smaller then the delalloc's range end.

So fix this by skipping keys with a type smaller than
BTRFS_EXTENT_DATA_KEY.

Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
fs/btrfs/inode.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1127,8 +1127,14 @@ next_slot:
num_bytes = 0;
btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);

- if (found_key.objectid > ino ||
- found_key.type > BTRFS_EXTENT_DATA_KEY ||
+ if (found_key.objectid > ino)
+ break;
+ if (WARN_ON_ONCE(found_key.objectid < ino) ||
+ found_key.type < BTRFS_EXTENT_DATA_KEY) {
+ path->slots[0]++;
+ goto next_slot;
+ }
+ if (found_key.type > BTRFS_EXTENT_DATA_KEY ||
found_key.offset > end)
break;

2015-11-24 22:46:54

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 43/52] FS-Cache: Don't override netfs's primary_index if registering failed

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Kinglong Mee <[email protected]>

commit b130ed5998e62879a66bad08931a2b5e832da95c upstream.

Only override netfs->primary_index when registering success.

Signed-off-by: Kinglong Mee <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>
[bwh: Backported to 3.2: no n_active or flags fields in fscache_cookie]
Signed-off-by: Ben Hutchings <[email protected]>
---
--- a/fs/fscache/netfs.c
+++ b/fs/fscache/netfs.c
@@ -22,6 +22,7 @@ static LIST_HEAD(fscache_netfs_list);
int __fscache_register_netfs(struct fscache_netfs *netfs)
{
struct fscache_netfs *ptr;
+ struct fscache_cookie *cookie;
int ret;

_enter("{%s}", netfs->name);
@@ -29,24 +30,23 @@ int __fscache_register_netfs(struct fsca
INIT_LIST_HEAD(&netfs->link);

/* allocate a cookie for the primary index */
- netfs->primary_index =
- kmem_cache_zalloc(fscache_cookie_jar, GFP_KERNEL);
+ cookie = kmem_cache_zalloc(fscache_cookie_jar, GFP_KERNEL);

- if (!netfs->primary_index) {
+ if (!cookie) {
_leave(" = -ENOMEM");
return -ENOMEM;
}

/* initialise the primary index cookie */
- atomic_set(&netfs->primary_index->usage, 1);
- atomic_set(&netfs->primary_index->n_children, 0);
+ atomic_set(&cookie->usage, 1);
+ atomic_set(&cookie->n_children, 0);

- netfs->primary_index->def = &fscache_fsdef_netfs_def;
- netfs->primary_index->parent = &fscache_fsdef_index;
- netfs->primary_index->netfs_data = netfs;
+ cookie->def = &fscache_fsdef_netfs_def;
+ cookie->parent = &fscache_fsdef_index;
+ cookie->netfs_data = netfs;

- spin_lock_init(&netfs->primary_index->lock);
- INIT_HLIST_HEAD(&netfs->primary_index->backing_objects);
+ spin_lock_init(&cookie->lock);
+ INIT_HLIST_HEAD(&cookie->backing_objects);

/* check the netfs type is not already present */
down_write(&fscache_addremove_sem);
@@ -57,9 +57,10 @@ int __fscache_register_netfs(struct fsca
goto already_registered;
}

- atomic_inc(&netfs->primary_index->parent->usage);
- atomic_inc(&netfs->primary_index->parent->n_children);
+ atomic_inc(&cookie->parent->usage);
+ atomic_inc(&cookie->parent->n_children);

+ netfs->primary_index = cookie;
list_add(&netfs->link, &fscache_netfs_list);
ret = 0;

@@ -69,10 +70,8 @@ int __fscache_register_netfs(struct fsca
already_registered:
up_write(&fscache_addremove_sem);

- if (ret < 0) {
- kmem_cache_free(fscache_cookie_jar, netfs->primary_index);
- netfs->primary_index = NULL;
- }
+ if (ret < 0)
+ kmem_cache_free(fscache_cookie_jar, cookie);

_leave(" = %d", ret);
return ret;

2015-11-24 22:53:46

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 01/52] PCI: Fix devfn for VPD access through function 0

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Alex Williamson <[email protected]>

commit 9d9240756e63dd87d6cbf5da8b98ceb8f8192b55 upstream.

Commit 932c435caba8 ("PCI: Add dev_flags bit to access VPD through function
0") passes PCI_SLOT(devfn) for the devfn parameter of pci_get_slot().
Generally this works because we're fairly well guaranteed that a PCIe
device is at slot address 0, but for the general case, including
conventional PCI, it's incorrect. We need to get the slot and then convert
it back into a devfn.

Fixes: 932c435caba8 ("PCI: Add dev_flags bit to access VPD through function 0")
Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Myron Stowe <[email protected]>
Acked-by: Mark Rustad <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/pci/access.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -358,7 +358,8 @@ static const struct pci_vpd_ops pci_vpd_
static ssize_t pci_vpd_f0_read(struct pci_dev *dev, loff_t pos, size_t count,
void *arg)
{
- struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
+ struct pci_dev *tdev = pci_get_slot(dev->bus,
+ PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
ssize_t ret;

if (!tdev)
@@ -372,7 +373,8 @@ static ssize_t pci_vpd_f0_read(struct pc
static ssize_t pci_vpd_f0_write(struct pci_dev *dev, loff_t pos, size_t count,
const void *arg)
{
- struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
+ struct pci_dev *tdev = pci_get_slot(dev->bus,
+ PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
ssize_t ret;

if (!tdev)
@@ -391,7 +393,8 @@ static const struct pci_vpd_ops pci_vpd_

static int pci_vpd_f0_dev_check(struct pci_dev *dev)
{
- struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
+ struct pci_dev *tdev = pci_get_slot(dev->bus,
+ PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
int ret = 0;

if (!tdev)

2015-11-24 22:46:51

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 19/52] staging: rtl8712: Add device ID for Sitecom WLA2100

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Larry Finger <[email protected]>

commit 1e6e63283691a2a9048a35d9c6c59cf0abd342e4 upstream.

This adds the USB ID for the Sitecom WLA2100. The Windows 10 inf file
was checked to verify that the addition is correct.

Reported-by: Frans van de Wiel <[email protected]>
Signed-off-by: Larry Finger <[email protected]>
Cc: Frans van de Wiel <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/staging/rtl8712/usb_intf.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/staging/rtl8712/usb_intf.c
+++ b/drivers/staging/rtl8712/usb_intf.c
@@ -147,6 +147,7 @@ static struct usb_device_id rtl871x_usb_
{USB_DEVICE(0x0DF6, 0x0058)},
{USB_DEVICE(0x0DF6, 0x0049)},
{USB_DEVICE(0x0DF6, 0x004C)},
+ {USB_DEVICE(0x0DF6, 0x006C)},
{USB_DEVICE(0x0DF6, 0x0064)},
/* Skyworth */
{USB_DEVICE(0x14b2, 0x3300)},

2015-11-24 22:53:55

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 08/52] devres: fix a for loop bounds check

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

commit 1f35d04a02a652f14566f875aef3a6f2af4cb77b upstream.

The iomap[] array has PCIM_IOMAP_MAX (6) elements and not
DEVICE_COUNT_RESOURCE (16). This bug was found using a static checker.
It may be that the "if (!(mask & (1 << i)))" check means we never
actually go past the end of the array in real life.

Fixes: ec04b075843d ('iomap: implement pcim_iounmap_regions()')
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
lib/devres.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/lib/devres.c
+++ b/lib/devres.c
@@ -339,7 +339,7 @@ void pcim_iounmap_regions(struct pci_dev
if (!iomap)
return;

- for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+ for (i = 0; i < PCIM_IOMAP_MAX; i++) {
if (!(mask & (1 << i)))
continue;

2015-11-24 22:53:59

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 05/52] wm831x_power: Use IRQF_ONESHOT to request threaded IRQs

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Valentin Rothberg <[email protected]>

commit 90adf98d9530054b8e665ba5a928de4307231d84 upstream.

Since commit 1c6c69525b40 ("genirq: Reject bogus threaded irq requests")
threaded IRQs without a primary handler need to be requested with
IRQF_ONESHOT, otherwise the request will fail.

scripts/coccinelle/misc/irqf_oneshot.cocci detected this issue.

Fixes: b5874f33bbaf ("wm831x_power: Use genirq")
Signed-off-by: Valentin Rothberg <[email protected]>
Signed-off-by: Sebastian Reichel <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/power/wm831x_power.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/power/wm831x_power.c
+++ b/drivers/power/wm831x_power.c
@@ -557,7 +557,7 @@ static __devinit int wm831x_power_probe(

irq = platform_get_irq_byname(pdev, "SYSLO");
ret = request_threaded_irq(irq, NULL, wm831x_syslo_irq,
- IRQF_TRIGGER_RISING, "System power low",
+ IRQF_TRIGGER_RISING | IRQF_ONESHOT, "System power low",
power);
if (ret != 0) {
dev_err(&pdev->dev, "Failed to request SYSLO IRQ %d: %d\n",
@@ -567,7 +567,7 @@ static __devinit int wm831x_power_probe(

irq = platform_get_irq_byname(pdev, "PWR SRC");
ret = request_threaded_irq(irq, NULL, wm831x_pwr_src_irq,
- IRQF_TRIGGER_RISING, "Power source",
+ IRQF_TRIGGER_RISING | IRQF_ONESHOT, "Power source",
power);
if (ret != 0) {
dev_err(&pdev->dev, "Failed to request PWR SRC IRQ %d: %d\n",
@@ -578,7 +578,7 @@ static __devinit int wm831x_power_probe(
for (i = 0; i < ARRAY_SIZE(wm831x_bat_irqs); i++) {
irq = platform_get_irq_byname(pdev, wm831x_bat_irqs[i]);
ret = request_threaded_irq(irq, NULL, wm831x_bat_irq,
- IRQF_TRIGGER_RISING,
+ IRQF_TRIGGER_RISING | IRQF_ONESHOT,
wm831x_bat_irqs[i],
power);
if (ret != 0) {

2015-11-24 22:53:52

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 06/52] mwifiex: fix mwifiex_rdeeprom_read()

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <[email protected]>

commit 1f9c6e1bc1ba5f8a10fcd6e99d170954d7c6d382 upstream.

There were several bugs here.

1) The done label was in the wrong place so we didn't copy any
information out when there was no command given.

2) We were using PAGE_SIZE as the size of the buffer instead of
"PAGE_SIZE - pos".

3) snprintf() returns the number of characters that would have been
printed if there were enough space. If there was not enough space
(and we had fixed the memory corruption bug #2) then it would result
in an information leak when we do simple_read_from_buffer(). I've
changed it to use scnprintf() instead.

I also removed the initialization at the start of the function, because
I thought it made the code a little more clear.

Fixes: 5e6e3a92b9a4 ('wireless: mwifiex: initial commit for Marvell mwifiex driver')
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Amitkumar Karwar <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
drivers/net/wireless/mwifiex/debugfs.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

--- a/drivers/net/wireless/mwifiex/debugfs.c
+++ b/drivers/net/wireless/mwifiex/debugfs.c
@@ -633,7 +633,7 @@ mwifiex_rdeeprom_read(struct file *file,
(struct mwifiex_private *) file->private_data;
unsigned long addr = get_zeroed_page(GFP_KERNEL);
char *buf = (char *) addr;
- int pos = 0, ret = 0, i;
+ int pos, ret, i;
u8 value[MAX_EEPROM_DATA];

if (!buf)
@@ -641,7 +641,7 @@ mwifiex_rdeeprom_read(struct file *file,

if (saved_offset == -1) {
/* No command has been given */
- pos += snprintf(buf, PAGE_SIZE, "0");
+ pos = snprintf(buf, PAGE_SIZE, "0");
goto done;
}

@@ -650,17 +650,17 @@ mwifiex_rdeeprom_read(struct file *file,
(u16) saved_bytes, value);
if (ret) {
ret = -EINVAL;
- goto done;
+ goto out_free;
}

- pos += snprintf(buf, PAGE_SIZE, "%d %d ", saved_offset, saved_bytes);
+ pos = snprintf(buf, PAGE_SIZE, "%d %d ", saved_offset, saved_bytes);

for (i = 0; i < saved_bytes; i++)
- pos += snprintf(buf + strlen(buf), PAGE_SIZE, "%d ", value[i]);
-
- ret = simple_read_from_buffer(ubuf, count, ppos, buf, pos);
+ pos += scnprintf(buf + pos, PAGE_SIZE - pos, "%d ", value[i]);

done:
+ ret = simple_read_from_buffer(ubuf, count, ppos, buf, pos);
+out_free:
free_page(addr);
return ret;
}

2015-11-24 22:53:49

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 03/52] mac80211: fix driver RSSI event calculations

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Johannes Berg <[email protected]>

commit 8ec6d97871f37e4743678ea4a455bd59580aa0f4 upstream.

The ifmgd->ave_beacon_signal value cannot be taken as is for
comparisons, it must be divided by since it's represented
like that for better accuracy of the EWMA calculations. This
would lead to invalid driver RSSI events. Fix the used value.

Fixes: 615f7b9bb1f8 ("mac80211: add driver RSSI threshold events")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
---
net/mac80211/mlme.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -1840,7 +1840,7 @@ static void ieee80211_rx_mgmt_beacon(str

if (ifmgd->rssi_min_thold != ifmgd->rssi_max_thold &&
ifmgd->count_beacon_signal >= IEEE80211_SIGNAL_AVE_MIN_COUNT) {
- int sig = ifmgd->ave_beacon_signal;
+ int sig = ifmgd->ave_beacon_signal / 16;
int last_sig = ifmgd->last_ave_beacon_signal;

/*

2015-11-24 22:53:43

by Ben Hutchings

[permalink] [raw]
Subject: [PATCH 3.2 12/52] iommu/vt-d: Fix ATSR handling for Root-Complex integrated endpoints

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: David Woodhouse <[email protected]>

commit d14053b3c714178525f22660e6aaf41263d00056 upstream.

The VT-d specification says that "Software must enable ATS on endpoint
devices behind a Root Port only if the Root Port is reported as
supporting ATS transactions."

We walk up the tree to find a Root Port, but for integrated devices we
don't find one — we get to the host bridge. In that case we *should*
allow ATS. Currently we don't, which means that we are incorrectly
failing to use ATS for the integrated graphics. Fix that.

We should never break out of this loop "naturally" with bus==NULL,
since we'll always find bridge==NULL in that case (and now return 1).

So remove the check for (!bridge) after the loop, since it can never
happen. If it did, it would be worthy of a BUG_ON(!bridge). But since
it'll oops anyway in that case, that'll do just as well.

Signed-off-by: David Woodhouse <[email protected]>
[bwh: Backported to 3.2:
- Adjust context
- There's no (!bridge) check to remove]
Signed-off-by: Ben Hutchings <[email protected]>
---
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3572,10 +3572,15 @@ found:
for (bus = dev->bus; bus; bus = bus->parent) {
struct pci_dev *bridge = bus->self;

- if (!bridge || !pci_is_pcie(bridge) ||
+ /* If it's an integrated device, allow ATS */
+ if (!bridge)
+ return 1;
+ /* Connected via non-PCIe: no ATS */
+ if (!pci_is_pcie(bridge) ||
bridge->pcie_type == PCI_EXP_TYPE_PCI_BRIDGE)
return 0;

+ /* If we found the root port, look it up in the ATSR */
if (bridge->pcie_type == PCI_EXP_TYPE_ROOT_PORT) {
for (i = 0; i < atsru->devices_cnt; i++)
if (atsru->devices[i] == bridge)

2015-11-24 22:39:11

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 3.2 39/52] scsi: Fix a bdi reregistration race

On 11/24/2015 02:35 PM, Ben Hutchings wrote:
> 3.2.74-rc1 review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Bart Van Assche <[email protected]>
>
> commit bf2cf3baa20b0a6cd2d08707ef05dc0e992a8aa0 upstream.

Hi Ben,

This patch fixes one bug but introduces another one ... So it's probably
better to drop this patch. See also the discussion in
http://www.spinics.net/lists/linux-scsi/msg90920.html.

Thanks,

Bart.

2015-11-24 23:54:20

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH 3.2 31/52] firewire: ohci: fix JMicron JMB38x IT context discovery

On Nov 24 Ben Hutchings wrote:
> commit 100ceb66d5c40cc0c7018e06a9474302470be73c upstream.
...
> [bwh: Backported to 3.2: log with fw_notify() instead of ohci_notice()]

Thanks for doing so, looks good to me.
--
Stefan Richter
-=====-===== =-== ==--=
http://arcgraph.de/sr/

2015-11-25 02:06:25

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 3.2 46/52] fs: make dumpable=2 require fully qualified path

On Tue, 24 Nov 2015, Ben Hutchings wrote:

> 3.2.74-rc1 review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Kees Cook <[email protected]>
>
> commit 9520628e8ceb69fa9a4aee6b57f22675d9e1b709 upstream.
>


Reviewed-by: James Morris <[email protected]>

--
James Morris
<[email protected]>

2015-11-25 02:22:45

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 3.2 00/52] 3.2.74-rc1 review

On 11/24/2015 02:33 PM, Ben Hutchings wrote:
> This is the start of the stable review cycle for the 3.2.74 release.
> There are 52 patches in this series, which will be posted as responses
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Nov 27 00:00:00 UTC 2015.
> Anything received after that time might be too late.
>

Build results:
total: 93 pass: 93 fail: 0
Qemu test results:
total: 58 pass: 58 fail: 0

Details are available at http://server.roeck-us.net:8010/builders.

Guenter

2015-11-25 02:38:12

by Chen Yu

[permalink] [raw]
Subject: Re: [PATCH 3.2 20/52] ACPI: Use correct IRQ when uninstalling ACPI interrupt handler

Hi, Ben,
ok for me.

thanks,
Yu
On Tue, 2015-11-24 at 22:33 +0000, Ben Hutchings wrote:
> 3.2.74-rc1 review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Chen Yu <[email protected]>
>
> commit 49e4b84333f338d4f183f28f1f3c1131b9fb2b5a upstream.
>
> Currently when the system is trying to uninstall the ACPI interrupt
> handler, it uses acpi_gbl_FADT.sci_interrupt as the IRQ number.
> However, the IRQ number that the ACPI interrupt handled is installed
> for comes from acpi_gsi_to_irq() and that is the number that should
> be used for the handler removal.
>
> Fix this problem by using the mapped IRQ returned from acpi_gsi_to_irq()
> as appropriate.
>
> Acked-by: Lv Zheng <[email protected]>
> Signed-off-by: Chen Yu <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> [bwh: Backported to 3.2: adjust context]
> Signed-off-by: Ben Hutchings <[email protected]>
> ---
> drivers/acpi/osl.c | 9 ++++++---
> include/linux/acpi.h | 6 ++++++
> 2 files changed, 12 insertions(+), 3 deletions(-)
>
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -82,6 +82,7 @@ static struct workqueue_struct *kacpid_w
> static struct workqueue_struct *kacpi_notify_wq;
> struct workqueue_struct *kacpi_hotplug_wq;
> EXPORT_SYMBOL(kacpi_hotplug_wq);
> +unsigned int acpi_sci_irq = INVALID_ACPI_IRQ;
>
> struct acpi_res_list {
> resource_size_t start;
> @@ -566,17 +567,19 @@ acpi_os_install_interrupt_handler(u32 gs
> acpi_irq_handler = NULL;
> return AE_NOT_ACQUIRED;
> }
> + acpi_sci_irq = irq;
>
> return AE_OK;
> }
>
> -acpi_status acpi_os_remove_interrupt_handler(u32 irq, acpi_osd_handler handler)
> +acpi_status acpi_os_remove_interrupt_handler(u32 gsi, acpi_osd_handler handler)
> {
> - if (irq != acpi_gbl_FADT.sci_interrupt)
> + if (gsi != acpi_gbl_FADT.sci_interrupt || !acpi_sci_irq_valid())
> return AE_BAD_PARAMETER;
>
> - free_irq(irq, acpi_irq);
> + free_irq(acpi_sci_irq, acpi_irq);
> acpi_irq_handler = NULL;
> + acpi_sci_irq = INVALID_ACPI_IRQ;
>
> return AE_OK;
> }
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -110,6 +110,12 @@ int acpi_unregister_ioapic(acpi_handle h
> void acpi_irq_stats_init(void);
> extern u32 acpi_irq_handled;
> extern u32 acpi_irq_not_handled;
> +extern unsigned int acpi_sci_irq;
> +#define INVALID_ACPI_IRQ ((unsigned)-1)
> +static inline bool acpi_sci_irq_valid(void)
> +{
> + return acpi_sci_irq != INVALID_ACPI_IRQ;
> +}
>
> extern int sbf_port;
> extern unsigned long acpi_realmode_flags;
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-11-25 11:31:40

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3.2 41/52] KVM: svm: unconditionally intercept #DB



On 24/11/2015 23:33, Ben Hutchings wrote:
> 3.2.74-rc1 review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Paolo Bonzini <[email protected]>
>
> commit cbdb967af3d54993f5814f1cee0ed311a055377d upstream.
>
> This is needed to avoid the possibility that the guest triggers
> an infinite stream of #DB exceptions (CVE-2015-8104).
>
> VMX is not affected: because it does not save DR6 in the VMCS,
> it already intercepts #DB unconditionally.
>
> Reported-by: Jan Beulich <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> [bwh: Backported to 3.2: #DB and #BP did not share a function, and there is
> no operation pointer referring to it, so remove update_db_intercept()
> entirely]

This is wrong, you still need to check the BP intercept in the
(incorrectly named as of 3.2) update_db_intercept function.

Something like:

-static void update_db_intercept(struct kvm_vcpu *vcpu)
+static void update_bp_intercept(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);

- clr_exception_intercept(svm, DB_VECTOR);
clr_exception_intercept(svm, BP_VECTOR);
-
- if (svm->nmi_singlestep)
- set_exception_intercept(svm, DB_VECTOR);
-
if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
- if (vcpu->guest_debug &
- (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
- set_exception_intercept(svm, DB_VECTOR);
if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
set_exception_intercept(svm, BP_VECTOR);
} else
vcpu->guest_debug = 0;
}


Then the calls in db_interception and enable_nmi_window can be removed,
but the one in svm_guest_debug is important.

Paolo

2015-11-25 17:44:24

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 00/52] 3.2.74-rc1 review

This is the combined diff for 3.2.74-rc1 relative to 3.2.73.

Ben.

--
Ben Hutchings
This sentence contradicts itself - no actually it doesn't.


Attachments:
linux-3.2.74-rc1.patch (52.76 kB)
signature.asc (811.00 B)
This is a digitally signed message part
Download all attachments

2015-11-25 17:56:54

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 41/52] KVM: svm: unconditionally intercept #DB

On Wed, 2015-11-25 at 12:31 +0100, Paolo Bonzini wrote:
>
> On 24/11/2015 23:33, Ben Hutchings wrote:
> > 3.2.74-rc1 review patch.  If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Paolo Bonzini <[email protected]>
> >
> > commit cbdb967af3d54993f5814f1cee0ed311a055377d upstream.
> >
> > This is needed to avoid the possibility that the guest triggers
> > an infinite stream of #DB exceptions (CVE-2015-8104).
> >
> > VMX is not affected: because it does not save DR6 in the VMCS,
> > it already intercepts #DB unconditionally.
> >
> > Reported-by: Jan Beulich <[email protected]>
> > Signed-off-by: Paolo Bonzini <[email protected]>
> > [bwh: Backported to 3.2: #DB and #BP did not share a function, and there is
> >  no operation pointer referring to it, so remove update_db_intercept()
> >  entirely]
>
> This is wrong, you still need to check the BP intercept in the
> (incorrectly named as of 3.2) update_db_intercept function.
>
> Something like:
>
> -static void update_db_intercept(struct kvm_vcpu *vcpu)
> +static void update_bp_intercept(struct kvm_vcpu *vcpu)
>  {
>  > > struct vcpu_svm *svm = to_svm(vcpu);
>
> -> > clr_exception_intercept(svm, DB_VECTOR);
>  > > clr_exception_intercept(svm, BP_VECTOR);
> -
> -> > if (svm->nmi_singlestep)
> -> > > set_exception_intercept(svm, DB_VECTOR);
> -
>  > > if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
> -> > > if (vcpu->guest_debug &
> -> > >     (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> -> > > > set_exception_intercept(svm, DB_VECTOR);
>  > > > if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
>  > > > > set_exception_intercept(svm, BP_VECTOR);
>  > > } else
> > > vcpu->guest_debug = 0;
>  }
>
>
> Then the calls in db_interception and enable_nmi_window can be removed,
> but the one in svm_guest_debug is important.

Sorry about that. I now have with this version:

From: Paolo Bonzini <[email protected]>
Date: Tue, 10 Nov 2015 09:14:39 +0100
Subject: KVM: svm: unconditionally intercept #DB

commit cbdb967af3d54993f5814f1cee0ed311a055377d upstream.

This is needed to avoid the possibility that the guest triggers
an infinite stream of #DB exceptions (CVE-2015-8104).

VMX is not affected: because it does not save DR6 in the VMCS,
it already intercepts #DB unconditionally.

Reported-by: Jan Beulich <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
[bwh: Backported to 3.2, with thanks to Paolo:
 - update_db_bp_intercept() was called update_db_intercept()
 - The remaining call is in svm_guest_debug() rather than through svm_x86_ops]
Signed-off-by: Ben Hutchings <[email protected]>
---
 arch/x86/kvm/svm.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1015,6 +1015,7 @@ static void init_vmcb(struct vcpu_svm *s
  set_exception_intercept(svm, UD_VECTOR);
  set_exception_intercept(svm, MC_VECTOR);
  set_exception_intercept(svm, AC_VECTOR);
+ set_exception_intercept(svm, DB_VECTOR);
 
  set_intercept(svm, INTERCEPT_INTR);
  set_intercept(svm, INTERCEPT_NMI);
@@ -1550,20 +1551,13 @@ static void svm_set_segment(struct kvm_v
  mark_dirty(svm->vmcb, VMCB_SEG);
 }
 
-static void update_db_intercept(struct kvm_vcpu *vcpu)
+static void update_bp_intercept(struct kvm_vcpu *vcpu)
 {
  struct vcpu_svm *svm = to_svm(vcpu);
 
- clr_exception_intercept(svm, DB_VECTOR);
  clr_exception_intercept(svm, BP_VECTOR);
 
- if (svm->nmi_singlestep)
- set_exception_intercept(svm, DB_VECTOR);
-
  if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
- if (vcpu->guest_debug &
-     (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
- set_exception_intercept(svm, DB_VECTOR);
  if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
  set_exception_intercept(svm, BP_VECTOR);
  } else
@@ -1581,7 +1575,7 @@ static void svm_guest_debug(struct kvm_v
 
  mark_dirty(svm->vmcb, VMCB_DR);
 
- update_db_intercept(vcpu);
+ update_bp_intercept(vcpu);
 }
 
 static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
@@ -1655,7 +1649,6 @@ static int db_interception(struct vcpu_s
  if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
  svm->vmcb->save.rflags &=
  ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
- update_db_intercept(&svm->vcpu);
  }
 
  if (svm->vcpu.guest_debug &
@@ -3557,7 +3550,6 @@ static void enable_nmi_window(struct kvm
   */
  svm->nmi_singlestep = true;
  svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
- update_db_intercept(vcpu);
 }
 
 static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)

--
Ben Hutchings
This sentence contradicts itself - no actually it doesn't.


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2015-11-25 17:57:26

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 00/52] 3.2.74-rc1 review

On Tue, 2015-11-24 at 18:22 -0800, Guenter Roeck wrote:
> On 11/24/2015 02:33 PM, Ben Hutchings wrote:
> > This is the start of the stable review cycle for the 3.2.74
> > release.
> > There are 52 patches in this series, which will be posted as
> > responses
> > to this one.  If anyone has any issues with these being applied,
> > please
> > let me know.
> >
> > Responses should be made by Fri Nov 27 00:00:00 UTC 2015.
> > Anything received after that time might be too late.
> >
>
> Build results:
> total: 93 pass: 93 fail: 0
> Qemu test results:
> total: 58 pass: 58 fail: 0
>
> Details are available at http://server.roeck-us.net:8010/builders.

Thanks for checking.

Ben.

--
Ben Hutchings
This sentence contradicts itself - no actually it doesn't.


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2015-11-25 17:58:09

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 46/52] fs: make dumpable=2 require fully qualified path

On Wed, 2015-11-25 at 13:06 +1100, James Morris wrote:
> On Tue, 24 Nov 2015, Ben Hutchings wrote:
>
> > 3.2.74-rc1 review patch.  If anyone has any objections, please let
> > me know.
> >
> > ------------------
> >
> > From: Kees Cook <[email protected]>
> >
> > commit 9520628e8ceb69fa9a4aee6b57f22675d9e1b709 upstream.
> >
>
>
> Reviewed-by: James Morris <[email protected]>

Thanks.

Ben.

--
Ben Hutchings
This sentence contradicts itself - no actually it doesn't.


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2015-11-25 18:01:14

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 39/52] scsi: Fix a bdi reregistration race

On Tue, 2015-11-24 at 14:39 -0800, Bart Van Assche wrote:
> On 11/24/2015 02:35 PM, Ben Hutchings wrote:
> > 3.2.74-rc1 review patch.  If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Bart Van Assche <[email protected]>
> >
> > commit bf2cf3baa20b0a6cd2d08707ef05dc0e992a8aa0 upstream.
>
> Hi Ben,
>
> This patch fixes one bug but introduces another one ... So it's probably
> better to drop this patch. See also the discussion in
> http://www.spinics.net/lists/linux-scsi/msg90920.html.

Thanks, I've now dropped it.

Ben.

--
Ben Hutchings
This sentence contradicts itself - no actually it doesn't.


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2015-11-25 18:06:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3.2 41/52] KVM: svm: unconditionally intercept #DB



On 25/11/2015 18:56, Ben Hutchings wrote:
> On Wed, 2015-11-25 at 12:31 +0100, Paolo Bonzini wrote:
>>
>> On 24/11/2015 23:33, Ben Hutchings wrote:
>>> 3.2.74-rc1 review patch. If anyone has any objections, please let me know.
>>>
>>> ------------------
>>>
>>> From: Paolo Bonzini <[email protected]>
>>>
>>> commit cbdb967af3d54993f5814f1cee0ed311a055377d upstream.
>>>
>>> This is needed to avoid the possibility that the guest triggers
>>> an infinite stream of #DB exceptions (CVE-2015-8104).
>>>
>>> VMX is not affected: because it does not save DR6 in the VMCS,
>>> it already intercepts #DB unconditionally.
>>>
>>> Reported-by: Jan Beulich <[email protected]>
>>> Signed-off-by: Paolo Bonzini <[email protected]>
>>> [bwh: Backported to 3.2: #DB and #BP did not share a function, and there is
>>> no operation pointer referring to it, so remove update_db_intercept()
>>> entirely]
>>
>> This is wrong, you still need to check the BP intercept in the
>> (incorrectly named as of 3.2) update_db_intercept function.
>>
>> Something like:
>>
>> -static void update_db_intercept(struct kvm_vcpu *vcpu)
>> +static void update_bp_intercept(struct kvm_vcpu *vcpu)
>> {
>> > > struct vcpu_svm *svm = to_svm(vcpu);
>>
>> -> > clr_exception_intercept(svm, DB_VECTOR);
>> > > clr_exception_intercept(svm, BP_VECTOR);
>> -
>> -> > if (svm->nmi_singlestep)
>> -> > > set_exception_intercept(svm, DB_VECTOR);
>> -
>> > > if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
>> -> > > if (vcpu->guest_debug &
>> -> > > (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
>> -> > > > set_exception_intercept(svm, DB_VECTOR);
>> > > > if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
>> > > > > set_exception_intercept(svm, BP_VECTOR);
>> > > } else
>> > > vcpu->guest_debug = 0;
>> }
>>
>>
>> Then the calls in db_interception and enable_nmi_window can be removed,
>> but the one in svm_guest_debug is important.
>
> Sorry about that. I now have with this version:
>
> From: Paolo Bonzini <[email protected]>
> Date: Tue, 10 Nov 2015 09:14:39 +0100
> Subject: KVM: svm: unconditionally intercept #DB
>
> commit cbdb967af3d54993f5814f1cee0ed311a055377d upstream.
>
> This is needed to avoid the possibility that the guest triggers
> an infinite stream of #DB exceptions (CVE-2015-8104).
>
> VMX is not affected: because it does not save DR6 in the VMCS,
> it already intercepts #DB unconditionally.
>
> Reported-by: Jan Beulich <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> [bwh: Backported to 3.2, with thanks to Paolo:
> - update_db_bp_intercept() was called update_db_intercept()
> - The remaining call is in svm_guest_debug() rather than through svm_x86_ops]
> Signed-off-by: Ben Hutchings <[email protected]>
> ---
> arch/x86/kvm/svm.c | 14 +++-----------
> 1 file changed, 3 insertions(+), 11 deletions(-)
>
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1015,6 +1015,7 @@ static void init_vmcb(struct vcpu_svm *s
> set_exception_intercept(svm, UD_VECTOR);
> set_exception_intercept(svm, MC_VECTOR);
> set_exception_intercept(svm, AC_VECTOR);
> + set_exception_intercept(svm, DB_VECTOR);
>
> set_intercept(svm, INTERCEPT_INTR);
> set_intercept(svm, INTERCEPT_NMI);
> @@ -1550,20 +1551,13 @@ static void svm_set_segment(struct kvm_v
> mark_dirty(svm->vmcb, VMCB_SEG);
> }
>
> -static void update_db_intercept(struct kvm_vcpu *vcpu)
> +static void update_bp_intercept(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
>
> - clr_exception_intercept(svm, DB_VECTOR);
> clr_exception_intercept(svm, BP_VECTOR);
>
> - if (svm->nmi_singlestep)
> - set_exception_intercept(svm, DB_VECTOR);
> -
> if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
> - if (vcpu->guest_debug &
> - (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> - set_exception_intercept(svm, DB_VECTOR);
> if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
> set_exception_intercept(svm, BP_VECTOR);
> } else
> @@ -1581,7 +1575,7 @@ static void svm_guest_debug(struct kvm_v
>
> mark_dirty(svm->vmcb, VMCB_DR);
>
> - update_db_intercept(vcpu);
> + update_bp_intercept(vcpu);
> }
>
> static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
> @@ -1655,7 +1649,6 @@ static int db_interception(struct vcpu_s
> if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
> svm->vmcb->save.rflags &=
> ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
> - update_db_intercept(&svm->vcpu);
> }
>
> if (svm->vcpu.guest_debug &
> @@ -3557,7 +3550,6 @@ static void enable_nmi_window(struct kvm
> */
> svm->nmi_singlestep = true;
> svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> - update_db_intercept(vcpu);
> }
>
> static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)
>

Thanks, this looks good.

Paolo

2015-11-25 23:05:16

by Luis Henriques

[permalink] [raw]
Subject: Re: [PATCH 3.2 22/52] ALSA: hda - Disable 64bit address for Creative HDA controllers

On Tue, Nov 24, 2015 at 10:33:59PM +0000, Ben Hutchings wrote:
> 3.2.74-rc1 review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Takashi Iwai <[email protected]>
>
> commit cadd16ea33a938d49aee99edd4758cc76048b399 upstream.
>
> We've had many reports that some Creative sound cards with CA0132
> don't work well. Some reported that it starts working after reloading
> the module, while some reported it starts working when a 32bit kernel
> is used. All these facts seem implying that the chip fails to
> communicate when the buffer is located in 64bit address.
>
> This patch addresses these issues by just adding AZX_DCAPS_NO_64BIT
> flag to the corresponding PCI entries. I casually had a chance to
> test an SB Recon3D board, and indeed this seems helping.
>
> Although this hasn't been tested on all Creative devices, it's safer
> to assume that this restriction applies to the rest of them, too. So
> the flag is applied to all Creative entries.
>
> Signed-off-by: Takashi Iwai <[email protected]>
> [bwh: Backported to 3.2: drop the change to AZX_DCAPS_PRESET_CTHDA]

Is there a reason for dropping this change? Adding the
AZX_DCAPS_NO_64BIT flag to the AZX_DCAPS_PRESET_CTHDA definition does
seem to make sense.

Cheers,
--
Lu?s

> Signed-off-by: Ben Hutchings <[email protected]>
> ---
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -3099,11 +3099,13 @@ static DEFINE_PCI_DEVICE_TABLE(azx_ids)
> .class = PCI_CLASS_MULTIMEDIA_HD_AUDIO << 8,
> .class_mask = 0xffffff,
> .driver_data = AZX_DRIVER_CTX | AZX_DCAPS_CTX_WORKAROUND |
> + AZX_DCAPS_NO_64BIT |
> AZX_DCAPS_RIRB_PRE_DELAY | AZX_DCAPS_POSFIX_LPIB },
> #else
> /* this entry seems still valid -- i.e. without emu20kx chip */
> { PCI_DEVICE(0x1102, 0x0009),
> .driver_data = AZX_DRIVER_CTX | AZX_DCAPS_CTX_WORKAROUND |
> + AZX_DCAPS_NO_64BIT |
> AZX_DCAPS_RIRB_PRE_DELAY | AZX_DCAPS_POSFIX_LPIB },
> #endif
> /* Vortex86MX */
>
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-11-25 23:11:06

by Luis Henriques

[permalink] [raw]
Subject: Re: [PATCH 3.2 38/52] Btrfs: fix race when listing an inode's xattrs

On Tue, Nov 24, 2015 at 10:33:59PM +0000, Ben Hutchings wrote:
> 3.2.74-rc1 review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Filipe Manana <[email protected]>
>
> commit f1cd1f0b7d1b5d4aaa5711e8f4e4898b0045cb6d upstream.
>
> When listing a inode's xattrs we have a time window where we race against
> a concurrent operation for adding a new hard link for our inode that makes
> us not return any xattr to user space. In order for this to happen, the
> first xattr of our inode needs to be at slot 0 of a leaf and the previous
> leaf must still have room for an inode ref (or extref) item, and this can
> happen because an inode's listxattrs callback does not lock the inode's
> i_mutex (nor does the VFS does it for us), but adding a hard link to an
> inode makes the VFS lock the inode's i_mutex before calling the inode's
> link callback.
>
> If we have the following leafs:
>
> Leaf X (has N items) Leaf Y
>
> [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 XATTR_ITEM 12345), ... ]
> slot N - 2 slot N - 1 slot 0
>
> The race illustrated by the following sequence diagram is possible:
>
> CPU 1 CPU 2
>
> btrfs_listxattr()
>
> searches for key (257 XATTR_ITEM 0)
>
> gets path with path->nodes[0] == leaf X
> and path->slots[0] == N
>
> because path->slots[0] is >=
> btrfs_header_nritems(leaf X), it calls
> btrfs_next_leaf()
>
> btrfs_next_leaf()
> releases the path
>
> adds key (257 INODE_REF 666)
> to the end of leaf X (slot N),
> and leaf X now has N + 1 items
>
> searches for the key (257 INODE_REF 256),
> with path->keep_locks == 1, because that
> is the last key it saw in leaf X before
> releasing the path
>
> ends up at leaf X again and it verifies
> that the key (257 INODE_REF 256) is no
> longer the last key in leaf X, so it
> returns with path->nodes[0] == leaf X
> and path->slots[0] == N, pointing to
> the new item with key (257 INODE_REF 666)
>
> btrfs_listxattr's loop iteration sees that
> the type of the key pointed by the path is
> different from the type BTRFS_XATTR_ITEM_KEY
> and so it breaks the loop and stops looking
> for more xattr items
> --> the application doesn't get any xattr
> listed for our inode
>
> So fix this by breaking the loop only if the key's type is greater than
> BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.
>
> Signed-off-by: Filipe Manana <[email protected]>
> [bwh: Backported to 3.2: s/found_key\.type/btrfs_key_type(\&found_key)/]

Actually, in my backport to 3.16 I decided to keep the usage of
'found_key.type' instead, as the usage of btrfs_key_type() has been
dropped with commit 962a298f3511 ("btrfs: kill the key type accessor
helpers").

Cheers,
--
Lu?s

> Signed-off-by: Ben Hutchings <[email protected]>
> ---
> fs/btrfs/xattr.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> --- a/fs/btrfs/xattr.c
> +++ b/fs/btrfs/xattr.c
> @@ -259,8 +259,10 @@ ssize_t btrfs_listxattr(struct dentry *d
> /* check to make sure this item is what we want */
> if (found_key.objectid != key.objectid)
> break;
> - if (btrfs_key_type(&found_key) != BTRFS_XATTR_ITEM_KEY)
> + if (btrfs_key_type(&found_key) > BTRFS_XATTR_ITEM_KEY)
> break;
> + if (btrfs_key_type(&found_key) < BTRFS_XATTR_ITEM_KEY)
> + goto next;
>
> di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
> if (verify_dir_item(root, leaf, di))
>
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-11-26 00:34:54

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 22/52] ALSA: hda - Disable 64bit address for Creative HDA controllers

On Wed, 2015-11-25 at 23:05 +0000, Luis Henriques wrote:
> On Tue, Nov 24, 2015 at 10:33:59PM +0000, Ben Hutchings wrote:
> > 3.2.74-rc1 review patch.  If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Takashi Iwai <[email protected]>
> >
> > commit cadd16ea33a938d49aee99edd4758cc76048b399 upstream.
> >
> > We've had many reports that some Creative sound cards with CA0132
> > don't work well.  Some reported that it starts working after reloading
> > the module, while some reported it starts working when a 32bit kernel
> > is used.  All these facts seem implying that the chip fails to
> > communicate when the buffer is located in 64bit address.
> >
> > This patch addresses these issues by just adding AZX_DCAPS_NO_64BIT
> > flag to the corresponding PCI entries.  I casually had a chance to
> > test an SB Recon3D board, and indeed this seems helping.
> >
> > Although this hasn't been tested on all Creative devices, it's safer
> > to assume that this restriction applies to the rest of them, too.  So
> > the flag is applied to all Creative entries.
> >
> > Signed-off-by: Takashi Iwai <[email protected]>
> > [bwh: Backported to 3.2: drop the change to AZX_DCAPS_PRESET_CTHDA]
>
> Is there a reason for dropping this change?  Adding the
> AZX_DCAPS_NO_64BIT flag to the AZX_DCAPS_PRESET_CTHDA definition does
> seem to make sense.
[...]

The AZX_DCAPS_PRESET_CTHDA macro was introduced in 3.5.

Ben.

--
Ben Hutchings
Unix is many things to many people,
but it's never been everything to anybody.


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2015-11-26 00:39:47

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 3.2 38/52] Btrfs: fix race when listing an inode's xattrs

On Wed, 2015-11-25 at 23:11 +0000, Luis Henriques wrote:
> On Tue, Nov 24, 2015 at 10:33:59PM +0000, Ben Hutchings wrote:
> > 3.2.74-rc1 review patch.  If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Filipe Manana <[email protected]>
> >
> > commit f1cd1f0b7d1b5d4aaa5711e8f4e4898b0045cb6d upstream.
> >
> > When listing a inode's xattrs we have a time window where we race against
> > a concurrent operation for adding a new hard link for our inode that makes
> > us not return any xattr to user space. In order for this to happen, the
> > first xattr of our inode needs to be at slot 0 of a leaf and the previous
> > leaf must still have room for an inode ref (or extref) item, and this can
> > happen because an inode's listxattrs callback does not lock the inode's
> > i_mutex (nor does the VFS does it for us), but adding a hard link to an
> > inode makes the VFS lock the inode's i_mutex before calling the inode's
> > link callback.
> >
> > If we have the following leafs:
> >
> >                Leaf X (has N items)                    Leaf Y
> >
> >  [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 XATTR_ITEM 12345), ... ]
> >            slot N - 2         slot N - 1              slot 0
> >
> > The race illustrated by the following sequence diagram is possible:
> >
> >        CPU 1                                               CPU 2
> >
> >   btrfs_listxattr()
> >
> >     searches for key (257 XATTR_ITEM 0)
> >
> >     gets path with path->nodes[0] == leaf X
> >     and path->slots[0] == N
> >
> >     because path->slots[0] is >=
> >     btrfs_header_nritems(leaf X), it calls
> >     btrfs_next_leaf()
> >
> >     btrfs_next_leaf()
> >       releases the path
> >
> >                                                    adds key (257 INODE_REF 666)
> >                                                    to the end of leaf X (slot N),
> >                                                    and leaf X now has N + 1 items
> >
> >       searches for the key (257 INODE_REF 256),
> >       with path->keep_locks == 1, because that
> >       is the last key it saw in leaf X before
> >       releasing the path
> >
> >       ends up at leaf X again and it verifies
> >       that the key (257 INODE_REF 256) is no
> >       longer the last key in leaf X, so it
> >       returns with path->nodes[0] == leaf X
> >       and path->slots[0] == N, pointing to
> >       the new item with key (257 INODE_REF 666)
> >
> >     btrfs_listxattr's loop iteration sees that
> >     the type of the key pointed by the path is
> >     different from the type BTRFS_XATTR_ITEM_KEY
> >     and so it breaks the loop and stops looking
> >     for more xattr items
> >       --> the application doesn't get any xattr
> >           listed for our inode
> >
> > So fix this by breaking the loop only if the key's type is greater than
> > BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.
> >
> > Signed-off-by: Filipe Manana <[email protected]>
> > [bwh: Backported to 3.2: s/found_key\.type/btrfs_key_type(\&found_key)/]
>
> Actually, in my backport to 3.16 I decided to keep the usage of
> 'found_key.type' instead, as the usage of btrfs_key_type() has been
> dropped with commit 962a298f3511 ("btrfs: kill the key type accessor
> helpers").
[...]

OK, that makes sense.  btrfs in 3.2 is pretty inconsistent about using
btrfs_key_type() anyway.

Ben.


--
Ben Hutchings
Unix is many things to many people,
but it's never been everything to anybody.


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2015-11-26 09:38:00

by Filipe Manana

[permalink] [raw]
Subject: Re: [PATCH 3.2 38/52] Btrfs: fix race when listing an inode's xattrs



On 11/26/2015 12:39 AM, Ben Hutchings wrote:
> On Wed, 2015-11-25 at 23:11 +0000, Luis Henriques wrote:
>> On Tue, Nov 24, 2015 at 10:33:59PM +0000, Ben Hutchings wrote:
>>> 3.2.74-rc1 review patch. If anyone has any objections, please let me know.
>>>
>>> ------------------
>>>
>>> From: Filipe Manana <[email protected]>
>>>
>>> commit f1cd1f0b7d1b5d4aaa5711e8f4e4898b0045cb6d upstream.
>>>
>>> When listing a inode's xattrs we have a time window where we race against
>>> a concurrent operation for adding a new hard link for our inode that makes
>>> us not return any xattr to user space. In order for this to happen, the
>>> first xattr of our inode needs to be at slot 0 of a leaf and the previous
>>> leaf must still have room for an inode ref (or extref) item, and this can
>>> happen because an inode's listxattrs callback does not lock the inode's
>>> i_mutex (nor does the VFS does it for us), but adding a hard link to an
>>> inode makes the VFS lock the inode's i_mutex before calling the inode's
>>> link callback.
>>>
>>> If we have the following leafs:
>>>
>>> Leaf X (has N items) Leaf Y
>>>
>>> [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 XATTR_ITEM 12345), ... ]
>>> slot N - 2 slot N - 1 slot 0
>>>
>>> The race illustrated by the following sequence diagram is possible:
>>>
>>> CPU 1 CPU 2
>>>
>>> btrfs_listxattr()
>>>
>>> searches for key (257 XATTR_ITEM 0)
>>>
>>> gets path with path->nodes[0] == leaf X
>>> and path->slots[0] == N
>>>
>>> because path->slots[0] is >=
>>> btrfs_header_nritems(leaf X), it calls
>>> btrfs_next_leaf()
>>>
>>> btrfs_next_leaf()
>>> releases the path
>>>
>>> adds key (257 INODE_REF 666)
>>> to the end of leaf X (slot N),
>>> and leaf X now has N + 1 items
>>>
>>> searches for the key (257 INODE_REF 256),
>>> with path->keep_locks == 1, because that
>>> is the last key it saw in leaf X before
>>> releasing the path
>>>
>>> ends up at leaf X again and it verifies
>>> that the key (257 INODE_REF 256) is no
>>> longer the last key in leaf X, so it
>>> returns with path->nodes[0] == leaf X
>>> and path->slots[0] == N, pointing to
>>> the new item with key (257 INODE_REF 666)
>>>
>>> btrfs_listxattr's loop iteration sees that
>>> the type of the key pointed by the path is
>>> different from the type BTRFS_XATTR_ITEM_KEY
>>> and so it breaks the loop and stops looking
>>> for more xattr items
>>> --> the application doesn't get any xattr
>>> listed for our inode
>>>
>>> So fix this by breaking the loop only if the key's type is greater than
>>> BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.
>>>
>>> Signed-off-by: Filipe Manana <[email protected]>
>>> [bwh: Backported to 3.2: s/found_key\.type/btrfs_key_type(\&found_key)/]
>>
>> Actually, in my backport to 3.16 I decided to keep the usage of
>> 'found_key.type' instead, as the usage of btrfs_key_type() has been
>> dropped with commit 962a298f3511 ("btrfs: kill the key type accessor
>> helpers").
> [...]
>
> OK, that makes sense. btrfs in 3.2 is pretty inconsistent about using
> btrfs_key_type() anyway.

Using the type field directly, instead of the accessor, is perfectly
safe (the field is an u8 so no worries about endianness conversions
unlike, other field of struct btrfs_key which are u64s).

>
> Ben.
>
>

2015-11-26 10:30:26

by Luis Henriques

[permalink] [raw]
Subject: Re: [PATCH 3.2 22/52] ALSA: hda - Disable 64bit address for Creative HDA controllers

On Thu, Nov 26, 2015 at 12:34:33AM +0000, Ben Hutchings wrote:
> On Wed, 2015-11-25 at 23:05 +0000, Luis Henriques wrote:
> > On Tue, Nov 24, 2015 at 10:33:59PM +0000, Ben Hutchings wrote:
> > > 3.2.74-rc1 review patch.??If anyone has any objections, please let me know.
> > >
> > > ------------------
> > >
> > > From: Takashi Iwai <[email protected]>
> > >
> > > commit cadd16ea33a938d49aee99edd4758cc76048b399 upstream.
> > >
> > > We've had many reports that some Creative sound cards with CA0132
> > > don't work well.??Some reported that it starts working after reloading
> > > the module, while some reported it starts working when a 32bit kernel
> > > is used.??All these facts seem implying that the chip fails to
> > > communicate when the buffer is located in 64bit address.
> > >
> > > This patch addresses these issues by just adding AZX_DCAPS_NO_64BIT
> > > flag to the corresponding PCI entries.??I casually had a chance to
> > > test an SB Recon3D board, and indeed this seems helping.
> > >
> > > Although this hasn't been tested on all Creative devices, it's safer
> > > to assume that this restriction applies to the rest of them, too.??So
> > > the flag is applied to all Creative entries.
> > >
> > > Signed-off-by: Takashi Iwai <[email protected]>
> > > [bwh: Backported to 3.2: drop the change to AZX_DCAPS_PRESET_CTHDA]
> >
> > Is there a reason for dropping this change???Adding the
> > AZX_DCAPS_NO_64BIT flag to the AZX_DCAPS_PRESET_CTHDA definition does
> > seem to make sense.
> [...]
>
> The AZX_DCAPS_PRESET_CTHDA macro was introduced in 3.5.
>

Doh, you're right. I was probably looking at the wrong branch. Sorry for
the noise.

Cheers,
--
Lu?s


> Ben.
>
> --
> Ben Hutchings
> Unix is many things to many people,
> but it's never been everything to anybody.

2015-11-27 05:23:18

by Daeho Jeong

[permalink] [raw]
Subject: Re: [PATCH 3.2 16/52] ext4, jbd2: ensure entering into panic after recording an error in superblock

It looks good. Thank you. :-)

------- Original Message -------
Sender : Ben Hutchings<[email protected]>
Date : 2015-11-25 07:33 (GMT+09:00)
Title : [PATCH 3.2 16/52] ext4, jbd2: ensure entering into panic after recording an error in superblock

3.2.74-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Daeho Jeong

commit 4327ba52afd03fc4b5afa0ee1d774c9c5b0e85c5 upstream.

If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
journaling will be aborted first and the error number will be recorded
into JBD2 superblock and, finally, the system will enter into the
panic state in "errors=panic" option. But, in the rare case, this
sequence is little twisted like the below figure and it will happen
that the system enters into panic state, which means the system reset
in mobile environment, before completion of recording an error in the
journal superblock. In this case, e2fsck cannot recognize that the
filesystem failure occurred in the previous run and the corruption
wouldn't be fixed.

Task A Task B
ext4_handle_error()
-> jbd2_journal_abort()
-> __journal_abort_soft()
-> __jbd2_journal_abort_hard()
| -> journal->j_flags |= JBD2_ABORT;
|
| __ext4_abort()
| -> jbd2_journal_abort()
| | -> __journal_abort_soft()
| | -> if (journal->j_flags & JBD2_ABORT)
| | return;
| -> panic()
|
-> jbd2_journal_update_sb_errno()

Tested-by: Hobin Woo
Signed-off-by: Daeho Jeong
Signed-off-by: Theodore Ts'o
Signed-off-by: Ben Hutchings
---
fs/ext4/super.c | 12 ++++++++++--
fs/jbd2/journal.c | 6 +++++-
include/linux/jbd2.h | 1 +
3 files changed, 16 insertions(+), 3 deletions(-)

--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -463,9 +463,13 @@ static void ext4_handle_error(struct sup
ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
sb->s_flags |= MS_RDONLY;
}
- if (test_opt(sb, ERRORS_PANIC))
+ if (test_opt(sb, ERRORS_PANIC)) {
+ if (EXT4_SB(sb)->s_journal &&
+ !(EXT4_SB(sb)->s_journal->j_flags & JBD2_REC_ERR))
+ return;
panic("EXT4-fs (device %s): panic forced after error\n",
sb->s_id);
+ }
}

void __ext4_error(struct super_block *sb, const char *function,
@@ -628,8 +632,12 @@ void __ext4_abort(struct super_block *sb
jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO);
save_error_info(sb, function, line);
}
- if (test_opt(sb, ERRORS_PANIC))
+ if (test_opt(sb, ERRORS_PANIC)) {
+ if (EXT4_SB(sb)->s_journal &&
+ !(EXT4_SB(sb)->s_journal->j_flags & JBD2_REC_ERR))
+ return;
panic("EXT4-fs panic from previous error\n");
+ }
}

void ext4_msg(struct super_block *sb, const char *prefix, const char *fmt, ...)
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1956,8 +1956,12 @@ static void __journal_abort_soft (journa

__jbd2_journal_abort_hard(journal);

- if (errno)
+ if (errno) {
jbd2_journal_update_sb_errno(journal);
+ write_lock(&journal->j_state_lock);
+ journal->j_flags |= JBD2_REC_ERR;
+ write_unlock(&journal->j_state_lock);
+ }
}

/**
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -954,6 +954,7 @@ struct journal_s
#define JBD2_ABORT_ON_SYNCDATA_ERR 0x040 /* Abort the journal on file
* data write error in ordered
* mode */
+#define JBD2_REC_ERR 0x080 /* The errno in the sb has been recorded */

/*
* Function declarations for the journaling transaction and buffer????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?