2012-05-10 17:32:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 00/52] 3.3.6-stable review

This is the start of the stable review cycle for the 3.3.6 release.
There are 52 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sat May 12 17:31:31 UTC 2012.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.3.6-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-------------
Documentation/networking/ip-sysctl.txt | 4 +-
Makefile | 4 +-
arch/arm/kernel/ptrace.c | 24 ++--
arch/arm/kernel/smp.c | 4 +-
arch/arm/kernel/sys_arm.c | 2 +-
.../include/mach/ctrl_module_pad_core_44xx.h | 8 +-
arch/arm/mach-orion5x/mpp.h | 4 +-
arch/arm/mm/cache-l2x0.c | 25 ++--
arch/ia64/kvm/kvm-ia64.c | 5 +
arch/s390/kvm/intercept.c | 20 +--
arch/s390/kvm/kvm-s390.c | 2 +-
arch/x86/boot/compressed/relocs.c | 2 -
arch/x86/kernel/setup_percpu.c | 14 +-
arch/x86/kvm/pmu.c | 2 +-
arch/x86/kvm/vmx.c | 10 +-
arch/x86/kvm/x86.c | 19 ++-
arch/x86/xen/enlighten.c | 7 +-
arch/x86/xen/mmu.c | 7 +-
drivers/block/mtip32xx/Kconfig | 2 +-
drivers/block/mtip32xx/mtip32xx.c | 34 ++---
drivers/gpu/drm/i915/intel_hdmi.c | 2 +-
drivers/gpu/drm/i915/intel_ringbuffer.c | 9 +-
drivers/gpu/drm/i915/intel_sdvo.c | 6 +
drivers/net/ethernet/broadcom/tg3.c | 18 ++-
drivers/net/ethernet/intel/e1000/e1000_main.c | 35 +++--
drivers/net/ethernet/intel/igb/igb_main.c | 3 +-
drivers/net/ethernet/marvell/sky2.c | 31 +++--
drivers/net/ethernet/marvell/sky2.h | 1 -
drivers/net/ethernet/sun/sungem.c | 2 +-
drivers/net/usb/asix.c | 4 +-
drivers/net/usb/smsc95xx.c | 2 +-
drivers/platform/x86/sony-laptop.c | 2 +-
drivers/regulator/max8997.c | 2 +-
fs/cifs/cifssmb.c | 6 +-
fs/hugetlbfs/inode.c | 54 +++-----
fs/nfsd/nfs4proc.c | 8 +-
fs/nfsd/vfs.c | 2 +-
include/asm-generic/statfs.h | 2 +-
include/linux/hugetlb.h | 14 +-
include/linux/kvm_host.h | 7 +
include/linux/netdevice.h | 50 +++++---
include/linux/seqlock.h | 2 +-
mm/hugetlb.c | 135 ++++++++++++++++----
net/core/dev.c | 20 +++
net/ipv4/tcp.c | 9 +-
net/ipv4/tcp_input.c | 11 +-
net/l2tp/l2tp_ip.c | 3 +-
net/sched/sch_netem.c | 6 +-
sound/soc/codecs/tlv320aic23.c | 4 +-
sound/soc/soc-core.c | 6 +-
virt/kvm/iommu.c | 23 ++--
virt/kvm/kvm_main.c | 23 ++--
52 files changed, 462 insertions(+), 239 deletions(-)


2012-05-10 17:33:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 01/52] drm/i915: enable dip before writing data on gen4

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paulo Zanoni <[email protected]>

commit c1230df7e19e0f27655c0eb9d966c7e03be7cc50 upstream.

While testing with the intel_infoframes tool on gen4, I see that when
video DIP is disabled, what we write to the DATA memory is not exactly
what we read back later.

This regression has been introduce in

commit 64a8fc0145a1d0fdc25fc9367c2e6c621955fb3b
Author: Jesse Barnes <[email protected]>
Date: Thu Sep 22 11:16:00 2011 +0530

drm/i915: fix ILK+ infoframe support

That commit was setting VIDEO_DIP_CTL to 0 when initializing, which
caused the problem.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43947
Tested-by: Yang Guang <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
Reviewed-by: Eugeni Dodonov <[email protected]>
[danvet: Pimped commit message by using the usual commit citation
layout.]
Signed-off-by: Daniel Vetter <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/i915/intel_hdmi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/i915/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/intel_hdmi.c
@@ -136,7 +136,7 @@ static void i9xx_write_infoframe(struct

val &= ~VIDEO_DIP_SELECT_MASK;

- I915_WRITE(VIDEO_DIP_CTL, val | port | flags);
+ I915_WRITE(VIDEO_DIP_CTL, VIDEO_DIP_ENABLE | val | port | flags);

for (i = 0; i < len; i += 4) {
I915_WRITE(VIDEO_DIP_DATA, *data);

2012-05-10 17:33:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 02/52] smsc95xx: mark link down on startup and let PHY interrupt deal with carrier changes

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paolo Pisati <[email protected]>

commit 07d69d4238418746a7b85c5d05ec17c658a2a390 upstream.

Without this patch sysfs reports the cable as present

flag@flag-desktop:~$ cat /sys/class/net/eth0/carrier
1

while it's not:

flag@flag-desktop:~$ sudo mii-tool eth0
eth0: no link

Tested on my Beagle XM.

v2: added mantainer to the list of recipient

Signed-off-by: Paolo Pisati <[email protected]>
Acked-by: Steve Glendinning <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/usb/smsc95xx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1191,7 +1191,7 @@ static const struct driver_info smsc95xx
.rx_fixup = smsc95xx_rx_fixup,
.tx_fixup = smsc95xx_tx_fixup,
.status = smsc95xx_status,
- .flags = FLAG_ETHER | FLAG_SEND_ZLP,
+ .flags = FLAG_ETHER | FLAG_SEND_ZLP | FLAG_LINK_INTR,
};

static const struct usb_device_id products[] = {

2012-05-10 17:33:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 04/52] xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Konrad Rzeszutek Wilk <[email protected]>

commit b7e5ffe5d83fa40d702976d77452004abbe35791 upstream.

If I try to do "cat /sys/kernel/debug/kernel_page_tables"
I end up with:

BUG: unable to handle kernel paging request at ffffc7fffffff000
IP: [<ffffffff8106aa51>] ptdump_show+0x221/0x480
PGD 0
Oops: 0000 [#1] SMP
CPU 0
.. snip..
RAX: 0000000000000000 RBX: ffffc00000000fff RCX: 0000000000000000
RDX: 0000800000000000 RSI: 0000000000000000 RDI: ffffc7fffffff000

which is due to the fact we are trying to access a PFN that is not
accessible to us. The reason (at least in this case) was that
PGD[256] is set to __HYPERVISOR_VIRT_START which was setup (by the
hypervisor) to point to a read-only linear map of the MFN->PFN array.
During our parsing we would get the MFN (a valid one), try to look
it up in the MFN->PFN tree and find it invalid and return ~0 as PFN.
Then pte_mfn_to_pfn would happilly feed that in, attach the flags
and return it back to the caller. 'ptdump_show' bitshifts it and
gets and invalid value that it tries to dereference.

Instead of doing all of that, we detect the ~0 case and just
return !_PAGE_PRESENT.

This bug has been in existence .. at least until 2.6.37 (yikes!)

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/xen/mmu.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -353,8 +353,13 @@ static pteval_t pte_mfn_to_pfn(pteval_t
{
if (val & _PAGE_PRESENT) {
unsigned long mfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
+ unsigned long pfn = mfn_to_pfn(mfn);
+
pteval_t flags = val & PTE_FLAGS_MASK;
- val = ((pteval_t)mfn_to_pfn(mfn) << PAGE_SHIFT) | flags;
+ if (unlikely(pfn == ~0))
+ val = flags & ~_PAGE_PRESENT;
+ else
+ val = ((pteval_t)pfn << PAGE_SHIFT) | flags;
}

return val;

2012-05-10 17:33:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 07/52] drm/i915: Do no set Stencil Cache eviction LRA w/a on gen7+

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Vetter <[email protected]>

commit 2e7a44814d802c8ba479164b8924070cd908d6b5 upstream.

I've flagged this while reviewing the first version and Ken Graunke
fixed it up in v2, but unfortunately Dave Airlie picked up the wrong
version.

Cc: Dave Airlie <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Signed-Off-by: Daniel Vetter <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -414,10 +414,8 @@ static int init_render_ring(struct intel
return ret;
}

- if (INTEL_INFO(dev)->gen >= 6) {
- I915_WRITE(INSTPM,
- INSTPM_FORCE_ORDERING << 16 | INSTPM_FORCE_ORDERING);

+ if (IS_GEN6(dev)) {
/* From the Sandybridge PRM, volume 1 part 3, page 24:
* "If this bit is set, STCunit will have LRA as replacement
* policy. [...] This bit must be reset. LRA replacement
@@ -427,6 +425,11 @@ static int init_render_ring(struct intel
CM0_STC_EVICT_DISABLE_LRA_SNB << CM0_MASK_SHIFT);
}

+ if (INTEL_INFO(dev)->gen >= 6) {
+ I915_WRITE(INSTPM,
+ INSTPM_FORCE_ORDERING << 16 | INSTPM_FORCE_ORDERING);
+ }
+
return ret;
}


2012-05-10 17:33:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 05/52] xen/pci: dont use PCI BIOS service for configuration space accesses

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: David Vrabel <[email protected]>

commit 76a8df7b49168509df02461f83fab117a4a86e08 upstream.

The accessing PCI configuration space with the PCI BIOS32 service does
not work in PV guests.

On systems without MMCONFIG or where the BIOS hasn't marked the
MMCONFIG region as reserved in the e820 map, the BIOS service is
probed (even though direct access is preferred) and this hangs.

Acked-by: Jan Beulich <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
[v1: Fixed compile error when CONFIG_PCI is not set]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/xen/enlighten.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -62,6 +62,7 @@
#include <asm/reboot.h>
#include <asm/stackprotector.h>
#include <asm/hypervisor.h>
+#include <asm/pci_x86.h>

#include "xen-ops.h"
#include "mmu.h"
@@ -1274,8 +1275,10 @@ asmlinkage void __init xen_start_kernel(
/* Make sure ACS will be enabled */
pci_request_acs();
}
-
-
+#ifdef CONFIG_PCI
+ /* PCI BIOS service won't work from a PV guest. */
+ pci_probe &= ~PCI_PROBE_BIOS;
+#endif
xen_raw_console_write("about to get started...\n");

xen_setup_runstate_info(0);

2012-05-10 17:34:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 14/52] Fix __read_seqcount_begin() to use ACCESS_ONCE for sequence value read

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <[email protected]>

commit 2f624278626677bfaf73fef97f86b37981621f5c upstream.

We really need to use a ACCESS_ONCE() on the sequence value read in
__read_seqcount_begin(), because otherwise the compiler might end up
reloading the value in between the test and the return of it. As a
result, it might end up returning an odd value (which means that a write
is in progress).

If the reader is then fast enough that that odd value is still the
current one when the read_seqcount_retry() is done, we might end up with
a "successful" read sequence, even despite the concurrent write being
active.

In practice this probably never really happens - there just isn't
anything else going on around the read of the sequence count, and the
common case is that we end up having a read barrier immediately
afterwards.

So the code sequence in which gcc might decide to reaload from memory is
small, and there's no reason to believe it would ever actually do the
reload. But if the compiler ever were to decide to do so, it would be
incredibly annoying to debug. Let's just make sure.

Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/linux/seqlock.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -141,7 +141,7 @@ static inline unsigned __read_seqcount_b
unsigned ret;

repeat:
- ret = s->sequence;
+ ret = ACCESS_ONCE(s->sequence);
if (unlikely(ret & 1)) {
cpu_relax();
goto repeat;

2012-05-10 17:34:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 19/52] regulator: Fix the logic to ensure new voltage setting in valid range

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Axel Lin <[email protected]>

commit f55205f4d4a8823a11bb8b37ef2ecbd78fb09463 upstream.

I think this is a typo.
To ensure new voltage setting won't greater than desc->max,
the equation should be desc->min + desc->step * new_val <= desc->max.

Signed-off-by: Axel Lin <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/regulator/max8997.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/regulator/max8997.c
+++ b/drivers/regulator/max8997.c
@@ -689,7 +689,7 @@ static int max8997_set_voltage_buck(stru
}

new_val++;
- } while (desc->min + desc->step + new_val <= desc->max);
+ } while (desc->min + desc->step * new_val <= desc->max);

new_idx = tmp_idx;
new_val = tmp_val;

2012-05-10 17:34:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 21/52] ARM: OMAP: Revert "ARM: OMAP: ctrl: Fix CONTROL_DSIPHY register fields"

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Archit Taneja <[email protected]>

commit 08ca7444f589bedf9ad5d82883e5d0754852d73b upstream.

This reverts commit 46f8c3c7e95c0d30d95911e7975ddc4f93b3e237.

The commit above swapped the DSI1_PPID and DSI2_PPID register fields in
CONTROL_DSIPHY to be in sync with the newer public OMAP TRMs(after version V).

With this commit, contention errors were reported on DSI lanes some OMAP4 SDPs.
After probing the DSI lanes on OMAP4 SDP, it was seen that setting bits in the
DSI2_PPID field was pulling up voltage on DSI1 lanes, and DSI1_PPID field was
pulling up voltage on DSI2 lanes.

This proves that the current version of OMAP4 TRM is incorrect, swap the
position of register fields according to the older TRM versions as they were
correct.

Acked-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Archit Taneja <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/mach-omap2/include/mach/ctrl_module_pad_core_44xx.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/arch/arm/mach-omap2/include/mach/ctrl_module_pad_core_44xx.h
+++ b/arch/arm/mach-omap2/include/mach/ctrl_module_pad_core_44xx.h
@@ -941,10 +941,10 @@
#define OMAP4_DSI2_LANEENABLE_MASK (0x7 << 29)
#define OMAP4_DSI1_LANEENABLE_SHIFT 24
#define OMAP4_DSI1_LANEENABLE_MASK (0x1f << 24)
-#define OMAP4_DSI2_PIPD_SHIFT 19
-#define OMAP4_DSI2_PIPD_MASK (0x1f << 19)
-#define OMAP4_DSI1_PIPD_SHIFT 14
-#define OMAP4_DSI1_PIPD_MASK (0x1f << 14)
+#define OMAP4_DSI1_PIPD_SHIFT 19
+#define OMAP4_DSI1_PIPD_MASK (0x1f << 19)
+#define OMAP4_DSI2_PIPD_SHIFT 14
+#define OMAP4_DSI2_PIPD_MASK (0x1f << 14)

/* CONTROL_MCBSPLP */
#define OMAP4_ALBCTRLRX_FSX_SHIFT 31

2012-05-10 17:34:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 23/52] netem: fix possible skb leak

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Eric Dumazet <[email protected]>

[ Upstream commit 116a0fc31c6c9b8fc821be5a96e5bf0b43260131 ]

skb_checksum_help(skb) can return an error, we must free skb in this
case. qdisc_drop(skb, sch) can also be feeded with a NULL skb (if
skb_unshare() failed), so lets use this generic helper.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sched/sch_netem.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -408,10 +408,8 @@ static int netem_enqueue(struct sk_buff
if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) {
if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
(skb->ip_summed == CHECKSUM_PARTIAL &&
- skb_checksum_help(skb))) {
- sch->qstats.drops++;
- return NET_XMIT_DROP;
- }
+ skb_checksum_help(skb)))
+ return qdisc_drop(skb, sch);

skb->data[net_random() % skb_headlen(skb)] ^= 1<<(net_random() % 8);
}

2012-05-10 17:34:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 24/52] net: In unregister_netdevice_notifier unregister the netdevices.

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: "Eric W. Biederman" <[email protected]>

[ Upstream commit 7d3d43dab4e978d8d9ad1acf8af15c9b1c4b0f0f ]

We already synthesize events in register_netdevice_notifier and synthesizing
events in unregister_netdevice_notifier allows to us remove the need for
special case cleanup code.

This change should be safe as it adds no new cases for existing callers
of unregiser_netdevice_notifier to handle.

Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/dev.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1412,14 +1412,34 @@ EXPORT_SYMBOL(register_netdevice_notifie
* register_netdevice_notifier(). The notifier is unlinked into the
* kernel structures and may then be reused. A negative errno code
* is returned on a failure.
+ *
+ * After unregistering unregister and down device events are synthesized
+ * for all devices on the device list to the removed notifier to remove
+ * the need for special case cleanup code.
*/

int unregister_netdevice_notifier(struct notifier_block *nb)
{
+ struct net_device *dev;
+ struct net *net;
int err;

rtnl_lock();
err = raw_notifier_chain_unregister(&netdev_chain, nb);
+ if (err)
+ goto unlock;
+
+ for_each_net(net) {
+ for_each_netdev(net, dev) {
+ if (dev->flags & IFF_UP) {
+ nb->notifier_call(nb, NETDEV_GOING_DOWN, dev);
+ nb->notifier_call(nb, NETDEV_DOWN, dev);
+ }
+ nb->notifier_call(nb, NETDEV_UNREGISTER, dev);
+ nb->notifier_call(nb, NETDEV_UNREGISTER_BATCH, dev);
+ }
+ }
+unlock:
rtnl_unlock();
return err;
}

2012-05-10 17:34:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 30/52] tcp: fix infinite cwnd in tcp_complete_cwr()

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Yuchung Cheng <[email protected]>

[ Upstream commit 1cebce36d660c83bd1353e41f3e66abd4686f215 ]

When the cwnd reduction is done, ssthresh may be infinite
if TCP enters CWR via ECN or F-RTO. If cwnd is not undone, i.e.,
undo_marker is set, tcp_complete_cwr() falsely set cwnd to the
infinite ssthresh value. The correct operation is to keep cwnd
intact because it has been updated in ECN or F-RTO.

Signed-off-by: Yuchung Cheng <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2866,11 +2866,14 @@ static inline void tcp_complete_cwr(stru

/* Do not moderate cwnd if it's already undone in cwr or recovery. */
if (tp->undo_marker) {
- if (inet_csk(sk)->icsk_ca_state == TCP_CA_CWR)
+ if (inet_csk(sk)->icsk_ca_state == TCP_CA_CWR) {
tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh);
- else /* PRR */
+ tp->snd_cwnd_stamp = tcp_time_stamp;
+ } else if (tp->snd_ssthresh < TCP_INFINITE_SSTHRESH) {
+ /* PRR algorithm. */
tp->snd_cwnd = tp->snd_ssthresh;
- tp->snd_cwnd_stamp = tcp_time_stamp;
+ tp->snd_cwnd_stamp = tcp_time_stamp;
+ }
}
tcp_ca_event(sk, CA_EVENT_COMPLETE_CWR);
}

2012-05-10 17:35:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 31/52] tcp: change tcp_adv_win_scale and tcp_rmem[2]

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Eric Dumazet <[email protected]>

[ Upstream commit b49960a05e32121d29316cfdf653894b88ac9190 ]

tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb->len / skb->truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 < 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Neal Cardwell <[email protected]>
Cc: Tom Herbert <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/networking/ip-sysctl.txt | 4 ++--
net/ipv4/tcp.c | 9 +++++----
net/ipv4/tcp_input.c | 2 +-
3 files changed, 8 insertions(+), 7 deletions(-)

--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -147,7 +147,7 @@ tcp_adv_win_scale - INTEGER
(if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
if it is <= 0.
Possible values are [-31, 31], inclusive.
- Default: 2
+ Default: 1

tcp_allowed_congestion_control - STRING
Show/set the congestion control choices available to non-privileged
@@ -410,7 +410,7 @@ tcp_rmem - vector of 3 INTEGERs: min, de
net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables
automatic tuning of that socket's receive buffer size, in which
case this value is ignored.
- Default: between 87380B and 4MB, depending on RAM size.
+ Default: between 87380B and 6MB, depending on RAM size.

tcp_sack - BOOLEAN
Enable select acknowledgments (SACKS).
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3240,7 +3240,7 @@ void __init tcp_init(void)
{
struct sk_buff *skb = NULL;
unsigned long limit;
- int max_share, cnt;
+ int max_rshare, max_wshare, cnt;
unsigned int i;
unsigned long jiffy = jiffies;

@@ -3300,15 +3300,16 @@ void __init tcp_init(void)
tcp_init_mem(&init_net);
/* Set per-socket limits to no more than 1/128 the pressure threshold */
limit = nr_free_buffer_pages() << (PAGE_SHIFT - 7);
- max_share = min(4UL*1024*1024, limit);
+ max_wshare = min(4UL*1024*1024, limit);
+ max_rshare = min(6UL*1024*1024, limit);

sysctl_tcp_wmem[0] = SK_MEM_QUANTUM;
sysctl_tcp_wmem[1] = 16*1024;
- sysctl_tcp_wmem[2] = max(64*1024, max_share);
+ sysctl_tcp_wmem[2] = max(64*1024, max_wshare);

sysctl_tcp_rmem[0] = SK_MEM_QUANTUM;
sysctl_tcp_rmem[1] = 87380;
- sysctl_tcp_rmem[2] = max(87380, max_share);
+ sysctl_tcp_rmem[2] = max(87380, max_rshare);

printk(KERN_INFO "TCP: Hash tables configured "
"(established %u bind %u)\n",
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -83,7 +83,7 @@ int sysctl_tcp_ecn __read_mostly = 2;
EXPORT_SYMBOL(sysctl_tcp_ecn);
int sysctl_tcp_dsack __read_mostly = 1;
int sysctl_tcp_app_win __read_mostly = 31;
-int sysctl_tcp_adv_win_scale __read_mostly = 2;
+int sysctl_tcp_adv_win_scale __read_mostly = 1;
EXPORT_SYMBOL(sysctl_tcp_adv_win_scale);

int sysctl_tcp_stdurg __read_mostly;

2012-05-10 17:35:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 32/52] net: Add memory barriers to prevent possible race in byte queue limits

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Alexander Duyck <[email protected]>

[ Upstream commit b37c0fbe3f6dfba1f8ad2aed47fb40578a254635 ]

This change adds a memory barrier to the byte queue limit code to address a
possible race as has been seen in the past with the
netif_stop_queue/netif_wake_queue logic.

Signed-off-by: Alexander Duyck <[email protected]>
Tested-by: Stephen Ko <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/netdevice.h | 49 ++++++++++++++++++++++++++++++----------------
1 file changed, 33 insertions(+), 16 deletions(-)

--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1898,12 +1898,22 @@ static inline void netdev_tx_sent_queue(
{
#ifdef CONFIG_BQL
dql_queued(&dev_queue->dql, bytes);
- if (unlikely(dql_avail(&dev_queue->dql) < 0)) {
- set_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state);
- if (unlikely(dql_avail(&dev_queue->dql) >= 0))
- clear_bit(__QUEUE_STATE_STACK_XOFF,
- &dev_queue->state);
- }
+
+ if (likely(dql_avail(&dev_queue->dql) >= 0))
+ return;
+
+ set_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state);
+
+ /*
+ * The XOFF flag must be set before checking the dql_avail below,
+ * because in netdev_tx_completed_queue we update the dql_completed
+ * before checking the XOFF flag.
+ */
+ smp_mb();
+
+ /* check again in case another CPU has just made room avail */
+ if (unlikely(dql_avail(&dev_queue->dql) >= 0))
+ clear_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state);
#endif
}

@@ -1916,16 +1926,23 @@ static inline void netdev_tx_completed_q
unsigned pkts, unsigned bytes)
{
#ifdef CONFIG_BQL
- if (likely(bytes)) {
- dql_completed(&dev_queue->dql, bytes);
- if (unlikely(test_bit(__QUEUE_STATE_STACK_XOFF,
- &dev_queue->state) &&
- dql_avail(&dev_queue->dql) >= 0)) {
- if (test_and_clear_bit(__QUEUE_STATE_STACK_XOFF,
- &dev_queue->state))
- netif_schedule_queue(dev_queue);
- }
- }
+ if (unlikely(!bytes))
+ return;
+
+ dql_completed(&dev_queue->dql, bytes);
+
+ /*
+ * Without the memory barrier there is a small possiblity that
+ * netdev_tx_sent_queue will miss the update and cause the queue to
+ * be stopped forever
+ */
+ smp_mb();
+
+ if (dql_avail(&dev_queue->dql) < 0)
+ return;
+
+ if (test_and_clear_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state))
+ netif_schedule_queue(dev_queue);
#endif
}


2012-05-10 17:35:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 27/52] sky2: fix receive length error in mixed non-VLAN/VLAN traffic

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: stephen hemminger <[email protected]>

[ Upstream commit e072b3fad5f3915102c94628b4971f52ff99dd05 ]

Bug: The VLAN bit of the MAC RX Status Word is unreliable in several older
supported chips. Sometimes the VLAN bit is not set for valid VLAN packets
and also sometimes the VLAN bit is set for non-VLAN packets that came after
a VLAN packet. This results in a receive length error when VLAN hardware
tagging is enabled.

Fix: Variation on original fix proposed by Mirko.
The VLAN information is decoded in the status loop, and can be
applied to the received SKB there. This eliminates the need for the
separate tag field in the interface data structure. The tag has to
be copied and cleared if packet is copied. This version checked out
with vlan and normal traffic.

Note: vlan_tx_tag_present should be renamed vlan_tag_present, but that
is outside scope of this.

Reported-by: Mirko Lindner <[email protected]>
Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/marvell/sky2.c | 28 +++++++++++++++++-----------
drivers/net/ethernet/marvell/sky2.h | 1 -
2 files changed, 17 insertions(+), 12 deletions(-)

--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -2484,9 +2484,11 @@ static struct sk_buff *receive_copy(stru
skb->ip_summed = re->skb->ip_summed;
skb->csum = re->skb->csum;
skb->rxhash = re->skb->rxhash;
+ skb->vlan_tci = re->skb->vlan_tci;

pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
length, PCI_DMA_FROMDEVICE);
+ re->skb->vlan_tci = 0;
re->skb->rxhash = 0;
re->skb->ip_summed = CHECKSUM_NONE;
skb_put(skb, length);
@@ -2572,9 +2574,6 @@ static struct sk_buff *sky2_receive(stru
struct sk_buff *skb = NULL;
u16 count = (status & GMR_FS_LEN) >> 16;

- if (status & GMR_FS_VLAN)
- count -= VLAN_HLEN; /* Account for vlan tag */
-
netif_printk(sky2, rx_status, KERN_DEBUG, dev,
"rx slot %u status 0x%x len %d\n",
sky2->rx_next, status, length);
@@ -2582,6 +2581,9 @@ static struct sk_buff *sky2_receive(stru
sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
prefetch(sky2->rx_ring + sky2->rx_next);

+ if (vlan_tx_tag_present(re->skb))
+ count -= VLAN_HLEN; /* Account for vlan tag */
+
/* This chip has hardware problems that generates bogus status.
* So do only marginal checking and expect higher level protocols
* to handle crap frames.
@@ -2639,11 +2641,8 @@ static inline void sky2_tx_done(struct n
}

static inline void sky2_skb_rx(const struct sky2_port *sky2,
- u32 status, struct sk_buff *skb)
+ struct sk_buff *skb)
{
- if (status & GMR_FS_VLAN)
- __vlan_hwaccel_put_tag(skb, be16_to_cpu(sky2->rx_tag));
-
if (skb->ip_summed == CHECKSUM_NONE)
netif_receive_skb(skb);
else
@@ -2697,6 +2696,14 @@ static void sky2_rx_checksum(struct sky2
}
}

+static void sky2_rx_tag(struct sky2_port *sky2, u16 length)
+{
+ struct sk_buff *skb;
+
+ skb = sky2->rx_ring[sky2->rx_next].skb;
+ __vlan_hwaccel_put_tag(skb, be16_to_cpu(length));
+}
+
static void sky2_rx_hash(struct sky2_port *sky2, u32 status)
{
struct sk_buff *skb;
@@ -2755,8 +2762,7 @@ static int sky2_status_intr(struct sky2_
}

skb->protocol = eth_type_trans(skb, dev);
-
- sky2_skb_rx(sky2, status, skb);
+ sky2_skb_rx(sky2, skb);

/* Stop after net poll weight */
if (++work_done >= to_do)
@@ -2764,11 +2770,11 @@ static int sky2_status_intr(struct sky2_
break;

case OP_RXVLAN:
- sky2->rx_tag = length;
+ sky2_rx_tag(sky2, length);
break;

case OP_RXCHKSVLAN:
- sky2->rx_tag = length;
+ sky2_rx_tag(sky2, length);
/* fall through */
case OP_RXCHKS:
if (likely(dev->features & NETIF_F_RXCSUM))
--- a/drivers/net/ethernet/marvell/sky2.h
+++ b/drivers/net/ethernet/marvell/sky2.h
@@ -2241,7 +2241,6 @@ struct sky2_port {
u16 rx_pending;
u16 rx_data_size;
u16 rx_nfrags;
- u16 rx_tag;

struct {
unsigned long last;

2012-05-10 17:34:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 28/52] sungem: Fix WakeOnLan

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Gerard Lledo <[email protected]>

[ Upstream commit 5a8887d39e1ba5ee2d4ccb94b14d6f2dce5ddfca ]

WakeOnLan was broken in this driver because gp->asleep_wol is a 1-bit
bitfield and it was being assigned WAKE_MAGIC, which is (1 << 5).
gp->asleep_wol remains 0 and the machine never wakes up. Fixed by casting
gp->wake_on_lan to bool. Tested on an iBook G4.

Signed-off-by: Gerard Lledo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/sun/sungem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -2340,7 +2340,7 @@ static int gem_suspend(struct pci_dev *p
netif_device_detach(dev);

/* Switch off chip, remember WOL setting */
- gp->asleep_wol = gp->wake_on_lan;
+ gp->asleep_wol = !!gp->wake_on_lan;
gem_do_stop(dev, gp->asleep_wol);

/* Unlock the network stack */

2012-05-10 17:36:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 37/52] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Takuya Yoshikawa <[email protected]>

(cherry picked from commit 565f3be2174611f364405bbea2d86e153c2e7e78)

Other threads may process the same page in that small window and skip
TLB flush and then return before these functions do flush.

Signed-off-by: Takuya Yoshikawa <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
virt/kvm/kvm_main.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)

--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -289,15 +289,15 @@ static void kvm_mmu_notifier_invalidate_
*/
idx = srcu_read_lock(&kvm->srcu);
spin_lock(&kvm->mmu_lock);
+
kvm->mmu_notifier_seq++;
need_tlb_flush = kvm_unmap_hva(kvm, address) | kvm->tlbs_dirty;
- spin_unlock(&kvm->mmu_lock);
- srcu_read_unlock(&kvm->srcu, idx);
-
/* we've to flush the tlb before the pages can be freed */
if (need_tlb_flush)
kvm_flush_remote_tlbs(kvm);

+ spin_unlock(&kvm->mmu_lock);
+ srcu_read_unlock(&kvm->srcu, idx);
}

static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
@@ -335,12 +335,12 @@ static void kvm_mmu_notifier_invalidate_
for (; start < end; start += PAGE_SIZE)
need_tlb_flush |= kvm_unmap_hva(kvm, start);
need_tlb_flush |= kvm->tlbs_dirty;
- spin_unlock(&kvm->mmu_lock);
- srcu_read_unlock(&kvm->srcu, idx);
-
/* we've to flush the tlb before the pages can be freed */
if (need_tlb_flush)
kvm_flush_remote_tlbs(kvm);
+
+ spin_unlock(&kvm->mmu_lock);
+ srcu_read_unlock(&kvm->srcu, idx);
}

static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
@@ -378,13 +378,14 @@ static int kvm_mmu_notifier_clear_flush_

idx = srcu_read_lock(&kvm->srcu);
spin_lock(&kvm->mmu_lock);
- young = kvm_age_hva(kvm, address);
- spin_unlock(&kvm->mmu_lock);
- srcu_read_unlock(&kvm->srcu, idx);

+ young = kvm_age_hva(kvm, address);
if (young)
kvm_flush_remote_tlbs(kvm);

+ spin_unlock(&kvm->mmu_lock);
+ srcu_read_unlock(&kvm->srcu, idx);
+
return young;
}


2012-05-10 17:36:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 42/52] KVM: VMX: vmx_set_cr0 expects kvm->srcu locked

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Marcelo Tosatti <[email protected]>

(cherry picked from commit 7a4f5ad051e02139a9f1c0f7f4b1acb88915852b)

vmx_set_cr0 is called from vcpu run context, therefore it expects
kvm->srcu to be held (for setting up the real-mode TSS).

Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kvm/vmx.c | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3918,7 +3918,9 @@ static int vmx_vcpu_reset(struct kvm_vcp
vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);

vmx->vcpu.arch.cr0 = X86_CR0_NW | X86_CR0_CD | X86_CR0_ET;
+ vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
vmx_set_cr0(&vmx->vcpu, kvm_read_cr0(vcpu)); /* enter rmode */
+ srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
vmx_set_cr4(&vmx->vcpu, 0);
vmx_set_efer(&vmx->vcpu, 0);
vmx_fpu_activate(&vmx->vcpu);

2012-05-10 17:36:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 44/52] KVM: lock slots_lock around device assignment

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Alex Williamson <[email protected]>

(cherry picked from commit 21a1416a1c945c5aeaeaf791b63c64926018eb77)

As pointed out by Jason Baron, when assigning a device to a guest
we first set the iommu domain pointer, which enables mapping
and unmapping of memory slots to the iommu. This leaves a window
where this path is enabled, but we haven't synchronized the iommu
mappings to the existing memory slots. Thus a slot being removed
at that point could send us down unexpected code paths removing
non-existent pinnings and iommu mappings. Take the slots_lock
around creating the iommu domain and initial mappings as well as
around iommu teardown to avoid this race.

Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
virt/kvm/iommu.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -240,9 +240,13 @@ int kvm_iommu_map_guest(struct kvm *kvm)
return -ENODEV;
}

+ mutex_lock(&kvm->slots_lock);
+
kvm->arch.iommu_domain = iommu_domain_alloc(&pci_bus_type);
- if (!kvm->arch.iommu_domain)
- return -ENOMEM;
+ if (!kvm->arch.iommu_domain) {
+ r = -ENOMEM;
+ goto out_unlock;
+ }

if (!allow_unsafe_assigned_interrupts &&
!iommu_domain_has_cap(kvm->arch.iommu_domain,
@@ -253,17 +257,16 @@ int kvm_iommu_map_guest(struct kvm *kvm)
" module option.\n", __func__);
iommu_domain_free(kvm->arch.iommu_domain);
kvm->arch.iommu_domain = NULL;
- return -EPERM;
+ r = -EPERM;
+ goto out_unlock;
}

r = kvm_iommu_map_memslots(kvm);
if (r)
- goto out_unmap;
-
- return 0;
+ kvm_iommu_unmap_memslots(kvm);

-out_unmap:
- kvm_iommu_unmap_memslots(kvm);
+out_unlock:
+ mutex_unlock(&kvm->slots_lock);
return r;
}

@@ -340,7 +343,11 @@ int kvm_iommu_unmap_guest(struct kvm *kv
if (!domain)
return 0;

+ mutex_lock(&kvm->slots_lock);
kvm_iommu_unmap_memslots(kvm);
+ kvm->arch.iommu_domain = NULL;
+ mutex_unlock(&kvm->slots_lock);
+
iommu_domain_free(domain);
return 0;
}

2012-05-10 17:36:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 45/52] sony-laptop: Enable keyboard backlight by default

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Josh Boyer <[email protected]>

commit 6fe6ae56a7cebaebc2e6daa11c423e4692f9b592 upstream.

When the keyboard backlight support was originally added, the commit said
to default it to on with a 10 second timeout. That actually wasn't the
case, as the default value is commented out for the kbd_backlight parameter.
Because it is a static variable, it gets set to 0 by default without some
other form of initialization.

However, it seems the function to set the value wasn't actually called
immediately, so whatever state the keyboard was in initially would remain.
Then commit df410d522410e67660 was introduced during the 2.6.39 timeframe to
immediately set whatever value was present (as well as attempt to
restore/reset the state on module removal or resume). That seems to have
now forced the light off immediately when the module is loaded unless
the option kbd_backlight=1 is specified.

Let's enable it by default again (for the first time). This should solve
https://bugzilla.redhat.com/show_bug.cgi?id=728478

Signed-off-by: Josh Boyer <[email protected]>
Acked-by: Mattia Dongili <[email protected]>
Signed-off-by: Matthew Garrett <[email protected]>
Cc: maximilian attems <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/platform/x86/sony-laptop.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/platform/x86/sony-laptop.c
+++ b/drivers/platform/x86/sony-laptop.c
@@ -127,7 +127,7 @@ MODULE_PARM_DESC(minor,
"default is -1 (automatic)");
#endif

-static int kbd_backlight; /* = 1 */
+static int kbd_backlight = 1;
module_param(kbd_backlight, int, 0444);
MODULE_PARM_DESC(kbd_backlight,
"set this to 0 to disable keyboard backlight, "

2012-05-10 17:36:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 46/52] hugepages: fix use after free bug in "quota" handling

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: David Gibson <[email protected]>

commit 90481622d75715bfcb68501280a917dbfe516029 upstream.

hugetlbfs_{get,put}_quota() are badly named. They don't interact with the
general quota handling code, and they don't much resemble its behaviour.
Rather than being about maintaining limits on on-disk block usage by
particular users, they are instead about maintaining limits on in-memory
page usage (including anonymous MAP_PRIVATE copied-on-write pages)
associated with a particular hugetlbfs filesystem instance.

Worse, they work by having callbacks to the hugetlbfs filesystem code from
the low-level page handling code, in particular from free_huge_page().
This is a layering violation of itself, but more importantly, if the
kernel does a get_user_pages() on hugepages (which can happen from KVM
amongst others), then the free_huge_page() can be delayed until after the
associated inode has already been freed. If an unmount occurs at the
wrong time, even the hugetlbfs superblock where the "quota" limits are
stored may have been freed.

Andrew Barry proposed a patch to fix this by having hugepages, instead of
storing a pointer to their address_space and reaching the superblock from
there, had the hugepages store pointers directly to the superblock,
bumping the reference count as appropriate to avoid it being freed.
Andrew Morton rejected that version, however, on the grounds that it made
the existing layering violation worse.

This is a reworked version of Andrew's patch, which removes the extra, and
some of the existing, layering violation. It works by introducing the
concept of a hugepage "subpool" at the lower hugepage mm layer - that is a
finite logical pool of hugepages to allocate from. hugetlbfs now creates
a subpool for each filesystem instance with a page limit set, and a
pointer to the subpool gets added to each allocated hugepage, instead of
the address_space pointer used now. The subpool has its own lifetime and
is only freed once all pages in it _and_ all other references to it (i.e.
superblocks) are gone.

subpools are optional - a NULL subpool pointer is taken by the code to
mean that no subpool limits are in effect.

Previous discussion of this bug found in: "Fix refcounting in hugetlbfs
quota handling.". See: https://lkml.org/lkml/2011/8/11/28 or
http://marc.info/?l=linux-mm&m=126928970510627&w=1

v2: Fixed a bug spotted by Hillf Danton, and removed the extra parameter to
alloc_huge_page() - since it already takes the vma, it is not necessary.

Signed-off-by: Andrew Barry <[email protected]>
Signed-off-by: David Gibson <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/hugetlbfs/inode.c | 54 +++++++------------
include/linux/hugetlb.h | 14 +++-
mm/hugetlb.c | 135 ++++++++++++++++++++++++++++++++++++++----------
3 files changed, 139 insertions(+), 64 deletions(-)

--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -600,9 +600,15 @@ static int hugetlbfs_statfs(struct dentr
spin_lock(&sbinfo->stat_lock);
/* If no limits set, just report 0 for max/free/used
* blocks, like simple_statfs() */
- if (sbinfo->max_blocks >= 0) {
- buf->f_blocks = sbinfo->max_blocks;
- buf->f_bavail = buf->f_bfree = sbinfo->free_blocks;
+ if (sbinfo->spool) {
+ long free_pages;
+
+ spin_lock(&sbinfo->spool->lock);
+ buf->f_blocks = sbinfo->spool->max_hpages;
+ free_pages = sbinfo->spool->max_hpages
+ - sbinfo->spool->used_hpages;
+ buf->f_bavail = buf->f_bfree = free_pages;
+ spin_unlock(&sbinfo->spool->lock);
buf->f_files = sbinfo->max_inodes;
buf->f_ffree = sbinfo->free_inodes;
}
@@ -618,6 +624,10 @@ static void hugetlbfs_put_super(struct s

if (sbi) {
sb->s_fs_info = NULL;
+
+ if (sbi->spool)
+ hugepage_put_subpool(sbi->spool);
+
kfree(sbi);
}
}
@@ -848,10 +858,14 @@ hugetlbfs_fill_super(struct super_block
sb->s_fs_info = sbinfo;
sbinfo->hstate = config.hstate;
spin_lock_init(&sbinfo->stat_lock);
- sbinfo->max_blocks = config.nr_blocks;
- sbinfo->free_blocks = config.nr_blocks;
sbinfo->max_inodes = config.nr_inodes;
sbinfo->free_inodes = config.nr_inodes;
+ sbinfo->spool = NULL;
+ if (config.nr_blocks != -1) {
+ sbinfo->spool = hugepage_new_subpool(config.nr_blocks);
+ if (!sbinfo->spool)
+ goto out_free;
+ }
sb->s_maxbytes = MAX_LFS_FILESIZE;
sb->s_blocksize = huge_page_size(config.hstate);
sb->s_blocksize_bits = huge_page_shift(config.hstate);
@@ -870,38 +884,12 @@ hugetlbfs_fill_super(struct super_block
sb->s_root = root;
return 0;
out_free:
+ if (sbinfo->spool)
+ kfree(sbinfo->spool);
kfree(sbinfo);
return -ENOMEM;
}

-int hugetlb_get_quota(struct address_space *mapping, long delta)
-{
- int ret = 0;
- struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
-
- if (sbinfo->free_blocks > -1) {
- spin_lock(&sbinfo->stat_lock);
- if (sbinfo->free_blocks - delta >= 0)
- sbinfo->free_blocks -= delta;
- else
- ret = -ENOMEM;
- spin_unlock(&sbinfo->stat_lock);
- }
-
- return ret;
-}
-
-void hugetlb_put_quota(struct address_space *mapping, long delta)
-{
- struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
-
- if (sbinfo->free_blocks > -1) {
- spin_lock(&sbinfo->stat_lock);
- sbinfo->free_blocks += delta;
- spin_unlock(&sbinfo->stat_lock);
- }
-}
-
static struct dentry *hugetlbfs_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -14,6 +14,15 @@ struct user_struct;
#include <linux/shm.h>
#include <asm/tlbflush.h>

+struct hugepage_subpool {
+ spinlock_t lock;
+ long count;
+ long max_hpages, used_hpages;
+};
+
+struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
+void hugepage_put_subpool(struct hugepage_subpool *spool);
+
int PageHuge(struct page *page);

void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
@@ -138,12 +147,11 @@ struct hugetlbfs_config {
};

struct hugetlbfs_sb_info {
- long max_blocks; /* blocks allowed */
- long free_blocks; /* blocks free */
long max_inodes; /* inodes allowed */
long free_inodes; /* inodes free */
spinlock_t stat_lock;
struct hstate *hstate;
+ struct hugepage_subpool *spool;
};


@@ -166,8 +174,6 @@ extern const struct file_operations huge
extern const struct vm_operations_struct hugetlb_vm_ops;
struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct,
struct user_struct **user, int creat_flags);
-int hugetlb_get_quota(struct address_space *mapping, long delta);
-void hugetlb_put_quota(struct address_space *mapping, long delta);

static inline int is_file_hugepages(struct file *file)
{
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -53,6 +53,84 @@ static unsigned long __initdata default_
*/
static DEFINE_SPINLOCK(hugetlb_lock);

+static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
+{
+ bool free = (spool->count == 0) && (spool->used_hpages == 0);
+
+ spin_unlock(&spool->lock);
+
+ /* If no pages are used, and no other handles to the subpool
+ * remain, free the subpool the subpool remain */
+ if (free)
+ kfree(spool);
+}
+
+struct hugepage_subpool *hugepage_new_subpool(long nr_blocks)
+{
+ struct hugepage_subpool *spool;
+
+ spool = kmalloc(sizeof(*spool), GFP_KERNEL);
+ if (!spool)
+ return NULL;
+
+ spin_lock_init(&spool->lock);
+ spool->count = 1;
+ spool->max_hpages = nr_blocks;
+ spool->used_hpages = 0;
+
+ return spool;
+}
+
+void hugepage_put_subpool(struct hugepage_subpool *spool)
+{
+ spin_lock(&spool->lock);
+ BUG_ON(!spool->count);
+ spool->count--;
+ unlock_or_release_subpool(spool);
+}
+
+static int hugepage_subpool_get_pages(struct hugepage_subpool *spool,
+ long delta)
+{
+ int ret = 0;
+
+ if (!spool)
+ return 0;
+
+ spin_lock(&spool->lock);
+ if ((spool->used_hpages + delta) <= spool->max_hpages) {
+ spool->used_hpages += delta;
+ } else {
+ ret = -ENOMEM;
+ }
+ spin_unlock(&spool->lock);
+
+ return ret;
+}
+
+static void hugepage_subpool_put_pages(struct hugepage_subpool *spool,
+ long delta)
+{
+ if (!spool)
+ return;
+
+ spin_lock(&spool->lock);
+ spool->used_hpages -= delta;
+ /* If hugetlbfs_put_super couldn't free spool due to
+ * an outstanding quota reference, free it now. */
+ unlock_or_release_subpool(spool);
+}
+
+static inline struct hugepage_subpool *subpool_inode(struct inode *inode)
+{
+ return HUGETLBFS_SB(inode->i_sb)->spool;
+}
+
+static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma)
+{
+ return subpool_inode(vma->vm_file->f_dentry->d_inode);
+}
+
/*
* Region tracking -- allows tracking of reservations and instantiated pages
* across the pages in a mapping.
@@ -533,9 +611,9 @@ static void free_huge_page(struct page *
*/
struct hstate *h = page_hstate(page);
int nid = page_to_nid(page);
- struct address_space *mapping;
+ struct hugepage_subpool *spool =
+ (struct hugepage_subpool *)page_private(page);

- mapping = (struct address_space *) page_private(page);
set_page_private(page, 0);
page->mapping = NULL;
BUG_ON(page_count(page));
@@ -551,8 +629,7 @@ static void free_huge_page(struct page *
enqueue_huge_page(h, page);
}
spin_unlock(&hugetlb_lock);
- if (mapping)
- hugetlb_put_quota(mapping, 1);
+ hugepage_subpool_put_pages(spool, 1);
}

static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
@@ -966,11 +1043,12 @@ static void return_unused_surplus_pages(
/*
* Determine if the huge page at addr within the vma has an associated
* reservation. Where it does not we will need to logically increase
- * reservation and actually increase quota before an allocation can occur.
- * Where any new reservation would be required the reservation change is
- * prepared, but not committed. Once the page has been quota'd allocated
- * an instantiated the change should be committed via vma_commit_reservation.
- * No action is required on failure.
+ * reservation and actually increase subpool usage before an allocation
+ * can occur. Where any new reservation would be required the
+ * reservation change is prepared, but not committed. Once the page
+ * has been allocated from the subpool and instantiated the change should
+ * be committed via vma_commit_reservation. No action is required on
+ * failure.
*/
static long vma_needs_reservation(struct hstate *h,
struct vm_area_struct *vma, unsigned long addr)
@@ -1019,24 +1097,24 @@ static void vma_commit_reservation(struc
static struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve)
{
+ struct hugepage_subpool *spool = subpool_vma(vma);
struct hstate *h = hstate_vma(vma);
struct page *page;
- struct address_space *mapping = vma->vm_file->f_mapping;
- struct inode *inode = mapping->host;
long chg;

/*
- * Processes that did not create the mapping will have no reserves and
- * will not have accounted against quota. Check that the quota can be
- * made before satisfying the allocation
- * MAP_NORESERVE mappings may also need pages and quota allocated
- * if no reserve mapping overlaps.
+ * Processes that did not create the mapping will have no
+ * reserves and will not have accounted against subpool
+ * limit. Check that the subpool limit can be made before
+ * satisfying the allocation MAP_NORESERVE mappings may also
+ * need pages and subpool limit allocated allocated if no reserve
+ * mapping overlaps.
*/
chg = vma_needs_reservation(h, vma, addr);
if (chg < 0)
return ERR_PTR(-VM_FAULT_OOM);
if (chg)
- if (hugetlb_get_quota(inode->i_mapping, chg))
+ if (hugepage_subpool_get_pages(spool, chg))
return ERR_PTR(-VM_FAULT_SIGBUS);

spin_lock(&hugetlb_lock);
@@ -1046,12 +1124,12 @@ static struct page *alloc_huge_page(stru
if (!page) {
page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
if (!page) {
- hugetlb_put_quota(inode->i_mapping, chg);
+ hugepage_subpool_put_pages(spool, chg);
return ERR_PTR(-VM_FAULT_SIGBUS);
}
}

- set_page_private(page, (unsigned long) mapping);
+ set_page_private(page, (unsigned long)spool);

vma_commit_reservation(h, vma, addr);

@@ -2072,6 +2150,7 @@ static void hugetlb_vm_op_close(struct v
{
struct hstate *h = hstate_vma(vma);
struct resv_map *reservations = vma_resv_map(vma);
+ struct hugepage_subpool *spool = subpool_vma(vma);
unsigned long reserve;
unsigned long start;
unsigned long end;
@@ -2087,7 +2166,7 @@ static void hugetlb_vm_op_close(struct v

if (reserve) {
hugetlb_acct_memory(h, -reserve);
- hugetlb_put_quota(vma->vm_file->f_mapping, reserve);
+ hugepage_subpool_put_pages(spool, reserve);
}
}
}
@@ -2316,7 +2395,7 @@ static int unmap_ref_private(struct mm_s
*/
address = address & huge_page_mask(h);
pgoff = vma_hugecache_offset(h, vma, address);
- mapping = (struct address_space *)page_private(page);
+ mapping = vma->vm_file->f_dentry->d_inode->i_mapping;

/*
* Take the mapping lock for the duration of the table walk. As
@@ -2871,11 +2950,12 @@ int hugetlb_reserve_pages(struct inode *
{
long ret, chg;
struct hstate *h = hstate_inode(inode);
+ struct hugepage_subpool *spool = subpool_inode(inode);

/*
* Only apply hugepage reservation if asked. At fault time, an
* attempt will be made for VM_NORESERVE to allocate a page
- * and filesystem quota without using reserves
+ * without using reserves
*/
if (vm_flags & VM_NORESERVE)
return 0;
@@ -2902,17 +2982,17 @@ int hugetlb_reserve_pages(struct inode *
if (chg < 0)
return chg;

- /* There must be enough filesystem quota for the mapping */
- if (hugetlb_get_quota(inode->i_mapping, chg))
+ /* There must be enough pages in the subpool for the mapping */
+ if (hugepage_subpool_get_pages(spool, chg))
return -ENOSPC;

/*
* Check enough hugepages are available for the reservation.
- * Hand back the quota if there are not
+ * Hand the pages back to the subpool if there are not
*/
ret = hugetlb_acct_memory(h, chg);
if (ret < 0) {
- hugetlb_put_quota(inode->i_mapping, chg);
+ hugepage_subpool_put_pages(spool, chg);
return ret;
}

@@ -2936,12 +3016,13 @@ void hugetlb_unreserve_pages(struct inod
{
struct hstate *h = hstate_inode(inode);
long chg = region_truncate(&inode->i_mapping->private_list, offset);
+ struct hugepage_subpool *spool = subpool_inode(inode);

spin_lock(&inode->i_lock);
inode->i_blocks -= (blocks_per_huge_page(h) * freed);
spin_unlock(&inode->i_lock);

- hugetlb_put_quota(inode->i_mapping, (chg - freed));
+ hugepage_subpool_put_pages(spool, (chg - freed));
hugetlb_acct_memory(h, -(chg - freed));
}


2012-05-10 17:37:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 48/52] mtip32xx: fix error handling in mtip_init()

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ryosuke Saito <[email protected]>

commit 6d27f09a6398ee086b11804aa3a16609876f0c7c upstream.

Ensure that block device is properly unregistered, if
pci_register_driver() fails.

Signed-off-by: Ryosuke Saito <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/block/mtip32xx/mtip32xx.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3605,18 +3605,25 @@ MODULE_DEVICE_TABLE(pci, mtip_pci_tbl);
*/
static int __init mtip_init(void)
{
+ int error;
+
printk(KERN_INFO MTIP_DRV_NAME " Version " MTIP_DRV_VERSION "\n");

/* Allocate a major block device number to use with this driver. */
- mtip_major = register_blkdev(0, MTIP_DRV_NAME);
- if (mtip_major < 0) {
+ error = register_blkdev(0, MTIP_DRV_NAME);
+ if (error <= 0) {
printk(KERN_ERR "Unable to register block device (%d)\n",
- mtip_major);
+ error);
return -EBUSY;
}
+ mtip_major = error;

/* Register our PCI operations. */
- return pci_register_driver(&mtip_pci_driver);
+ error = pci_register_driver(&mtip_pci_driver);
+ if (error)
+ unregister_blkdev(mtip_major, MTIP_DRV_NAME);
+
+ return error;
}

/*

2012-05-10 17:37:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 43/52] KVM: VMX: Fix kvm_set_shared_msr() called in preemptible context

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Avi Kivity <[email protected]>

(cherry picked from commit 2225fd56049643c1a7d645c0ce9d499d43c7974e)

kvm_set_shared_msr() may not be called in preemptible context,
but vmx_set_msr() does so:

BUG: using smp_processor_id() in preemptible [00000000] code: qemu-kvm/22713
caller is kvm_set_shared_msr+0x32/0xa0 [kvm]
Pid: 22713, comm: qemu-kvm Not tainted 3.4.0-rc3+ #39
Call Trace:
[<ffffffff8131fa82>] debug_smp_processor_id+0xe2/0x100
[<ffffffffa0328ae2>] kvm_set_shared_msr+0x32/0xa0 [kvm]
[<ffffffffa03a103b>] vmx_set_msr+0x28b/0x2d0 [kvm_intel]
...

Making kvm_set_shared_msr() work in preemptible is cleaner, but
it's used in the fast path. Making two variants is overkill, so
this patch just disables preemption around the call.

Reported-by: Dave Jones <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kvm/vmx.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2219,9 +2219,12 @@ static int vmx_set_msr(struct kvm_vcpu *
msr = find_msr_entry(vmx, msr_index);
if (msr) {
msr->data = data;
- if (msr - vmx->guest_msrs < vmx->save_nmsrs)
+ if (msr - vmx->guest_msrs < vmx->save_nmsrs) {
+ preempt_disable();
kvm_set_shared_msr(msr->index, msr->data,
msr->mask);
+ preempt_enable();
+ }
break;
}
ret = kvm_set_msr_common(vcpu, msr_index, data);

2012-05-10 17:37:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 33/52] net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Alexander Duyck <[email protected]>

[ Upstream commit 5c4903549c05bbb373479e0ce2992573c120654a ]

We are seeing dev_watchdog hangs on several drivers. I suspect this is due
to the __QUEUE_STATE_STACK_XOFF bit being set prior to a reset for link
change, and then not being cleared by netdev_tx_reset_queue. This change
corrects that.

In addition we were seeing dev_watchdog hangs on igb after running the
ethtool tests. We found this to be due to the fact that the ethtool test
runs the same logic as ndo_start_xmit, but we were never clearing the XOFF
flag since the loopback test in ethtool does not do byte queue accounting.

Signed-off-by: Alexander Duyck <[email protected]>
Tested-by: Stephen Ko <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/intel/igb/igb_main.c | 3 ++-
include/linux/netdevice.h | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2750,6 +2750,8 @@ void igb_configure_tx_ring(struct igb_ad

txdctl |= E1000_TXDCTL_QUEUE_ENABLE;
wr32(E1000_TXDCTL(reg_idx), txdctl);
+
+ netdev_tx_reset_queue(txring_txq(ring));
}

/**
@@ -3242,7 +3244,6 @@ static void igb_clean_tx_ring(struct igb
buffer_info = &tx_ring->tx_buffer_info[i];
igb_unmap_and_free_tx_resource(tx_ring, buffer_info);
}
- netdev_tx_reset_queue(txring_txq(tx_ring));

size = sizeof(struct igb_tx_buffer) * tx_ring->count;
memset(tx_ring->tx_buffer_info, 0, size);
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1955,6 +1955,7 @@ static inline void netdev_completed_queu
static inline void netdev_tx_reset_queue(struct netdev_queue *q)
{
#ifdef CONFIG_BQL
+ clear_bit(__QUEUE_STATE_STACK_XOFF, &q->state);
dql_reset(&q->dql);
#endif
}

2012-05-10 17:37:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 51/52] ARM: 7397/1: l2x0: only apply workaround for erratum #753970 on PL310

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Will Deacon <[email protected]>

commit f154fe9b806574437b47f08e924ad10c0e240b23 upstream.

The workaround for PL310 erratum #753970 can lead to deadlock on systems
with an L220 cache controller.

This patch makes the workaround effective only when the cache controller
is identified as a PL310 at probe time.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Russell King <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/mm/cache-l2x0.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -32,6 +32,7 @@ static void __iomem *l2x0_base;
static DEFINE_RAW_SPINLOCK(l2x0_lock);
static uint32_t l2x0_way_mask; /* Bitmask of active ways */
static uint32_t l2x0_size;
+static unsigned long sync_reg_offset = L2X0_CACHE_SYNC;

struct l2x0_regs l2x0_saved_regs;

@@ -61,12 +62,7 @@ static inline void cache_sync(void)
{
void __iomem *base = l2x0_base;

-#ifdef CONFIG_PL310_ERRATA_753970
- /* write to an unmmapped register */
- writel_relaxed(0, base + L2X0_DUMMY_REG);
-#else
- writel_relaxed(0, base + L2X0_CACHE_SYNC);
-#endif
+ writel_relaxed(0, base + sync_reg_offset);
cache_wait(base + L2X0_CACHE_SYNC, 1);
}

@@ -331,6 +327,10 @@ void __init l2x0_init(void __iomem *base
else
ways = 8;
type = "L310";
+#ifdef CONFIG_PL310_ERRATA_753970
+ /* Unmapped register. */
+ sync_reg_offset = L2X0_DUMMY_REG;
+#endif
break;
case L2X0_CACHE_ID_PART_L210:
ways = (aux >> 13) & 0xf;

2012-05-10 17:38:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 52/52] ARM: 7398/1: l2x0: only write to debug registers on PL310

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Will Deacon <[email protected]>

commit ab4d536890853ab6675ede65db40e2c0980cb0ea upstream.

PL310 errata #588369 and #727915 require writes to the debug registers
of the cache controller to work around known problems. Writing these
registers on L220 may cause deadlock, so ensure that we only perform
this operation when we identify a PL310 at probe time.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/mm/cache-l2x0.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -81,10 +81,13 @@ static inline void l2x0_inv_line(unsigne
}

#if defined(CONFIG_PL310_ERRATA_588369) || defined(CONFIG_PL310_ERRATA_727915)
+static inline void debug_writel(unsigned long val)
+{
+ if (outer_cache.set_debug)
+ outer_cache.set_debug(val);
+}

-#define debug_writel(val) outer_cache.set_debug(val)
-
-static void l2x0_set_debug(unsigned long val)
+static void pl310_set_debug(unsigned long val)
{
writel_relaxed(val, l2x0_base + L2X0_DEBUG_CTRL);
}
@@ -94,7 +97,7 @@ static inline void debug_writel(unsigned
{
}

-#define l2x0_set_debug NULL
+#define pl310_set_debug NULL
#endif

#ifdef CONFIG_PL310_ERRATA_588369
@@ -331,6 +334,7 @@ void __init l2x0_init(void __iomem *base
/* Unmapped register. */
sync_reg_offset = L2X0_DUMMY_REG;
#endif
+ outer_cache.set_debug = pl310_set_debug;
break;
case L2X0_CACHE_ID_PART_L210:
ways = (aux >> 13) & 0xf;
@@ -379,7 +383,6 @@ void __init l2x0_init(void __iomem *base
outer_cache.flush_all = l2x0_flush_all;
outer_cache.inv_all = l2x0_inv_all;
outer_cache.disable = l2x0_disable;
- outer_cache.set_debug = l2x0_set_debug;

printk(KERN_INFO "%s cache controller enabled\n", type);
printk(KERN_INFO "l2x0: %d ways, CACHE_ID 0x%08x, AUX_CTRL 0x%08x, Cache size: %d B\n",

2012-05-10 17:39:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 49/52] block: mtip32xx: remove HOTPLUG_PCI_PCIE dependancy

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Greg Kroah-Hartman <[email protected]>

commit 63634806519b49bb43f37e53a1e8366eb3e846a4 upstream.

This removes the HOTPLUG_PCI_PCIE dependency on the driver and makes it
depend on PCI.

Cc: Sam Bradshaw <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Acked-by: Asai Thambi S P <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

---
drivers/block/mtip32xx/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/block/mtip32xx/Kconfig
+++ b/drivers/block/mtip32xx/Kconfig
@@ -4,6 +4,6 @@

config BLK_DEV_PCIESSD_MTIP32XX
tristate "Block Device Driver for Micron PCIe SSDs"
- depends on HOTPLUG_PCI_PCIE
+ depends on PCI
help
This enables the block driver for Micron PCIe SSDs.

2012-05-10 17:39:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 50/52] nfsd: dont fail unchecked creates of non-special files

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: "J. Bruce Fields" <[email protected]>

commit 9dc4e6c4d1182d34604ea40fef641775f5b15456 upstream.

Allow a v3 unchecked open of a non-regular file succeed as if it were a
lookup; typically a client in such a case will want to fall back on a
local open, so succeeding and giving it the filehandle is more useful
than failing with nfserr_exist, which makes it appear that nothing at
all exists by that name.

Similarly for v4, on an open-create, return the same errors we would on
an attempt to open a non-regular file, instead of returning
nfserr_exist.

This fixes a problem found doing a v4 open of a symlink with
O_RDONLY|O_CREAT, which resulted in the current client returning EEXIST.

Thanks also to Trond for analysis.

Reported-by: Orion Poplawski <[email protected]>
Tested-by: Orion Poplawski <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
[bwh: Backported to 3.2 and 3.3: use &resfh, not resfh]
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/nfsd/nfs4proc.c | 8 ++++----
fs/nfsd/vfs.c | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)

--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -231,17 +231,17 @@ do_open_lookup(struct svc_rqst *rqstp, s
*/
if (open->op_createmode == NFS4_CREATE_EXCLUSIVE && status == 0)
open->op_bmval[1] = (FATTR4_WORD1_TIME_ACCESS |
- FATTR4_WORD1_TIME_MODIFY);
+ FATTR4_WORD1_TIME_MODIFY);
} else {
status = nfsd_lookup(rqstp, current_fh,
open->op_fname.data, open->op_fname.len, &resfh);
fh_unlock(current_fh);
- if (status)
- goto out;
- status = nfsd_check_obj_isreg(&resfh);
}
if (status)
goto out;
+ status = nfsd_check_obj_isreg(&resfh);
+ if (status)
+ goto out;

if (is_create_with_attrs(open) && open->op_acl != NULL)
do_set_nfs4_acl(rqstp, &resfh, open->op_acl, open->op_bmval);
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1450,7 +1450,7 @@ do_nfsd_create(struct svc_rqst *rqstp, s
switch (createmode) {
case NFS3_CREATE_UNCHECKED:
if (! S_ISREG(dchild->d_inode->i_mode))
- err = nfserr_exist;
+ goto out;
else if (truncp) {
/* in nfsv4, we need to treat this case a little
* differently. we don't want to truncate the

2012-05-10 17:41:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 47/52] mtip32xx: fix incorrect value set for drv_cleanup_done, and re-initialize and start port in mtip_restart_port()

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Asai Thambi S P <[email protected]>

commit 22be2e6e13ac09b20000582ac34d47fb0029a6da upstream.

This patch includes two changes:
* fix incorrect value set for drv_cleanup_done
* re-initialize and start port in mtip_restart_port()

Signed-off-by: Asai Thambi S P <[email protected]>
Signed-off-by: Sam Bradshaw <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/block/mtip32xx/mtip32xx.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)

--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -422,6 +422,10 @@ static void mtip_init_port(struct mtip_p
/* Clear any pending interrupts for this port */
writel(readl(port->mmio + PORT_IRQ_STAT), port->mmio + PORT_IRQ_STAT);

+ /* Clear any pending interrupts on the HBA. */
+ writel(readl(port->dd->mmio + HOST_IRQ_STAT),
+ port->dd->mmio + HOST_IRQ_STAT);
+
/* Enable port interrupts */
writel(DEF_PORT_IRQ, port->mmio + PORT_IRQ_MASK);
}
@@ -490,11 +494,9 @@ static void mtip_restart_port(struct mti
dev_warn(&port->dd->pdev->dev,
"COM reset failed\n");

- /* Clear SError, the PxSERR.DIAG.x should be set so clear it */
- writel(readl(port->mmio + PORT_SCR_ERR), port->mmio + PORT_SCR_ERR);
+ mtip_init_port(port);
+ mtip_start_port(port);

- /* Enable the DMA engine */
- mtip_enable_engine(port, 1);
}

/*
@@ -3359,9 +3361,6 @@ static int mtip_pci_probe(struct pci_dev
return -ENOMEM;
}

- /* Set the atomic variable as 1 in case of SRSI */
- atomic_set(&dd->drv_cleanup_done, true);
-
atomic_set(&dd->resumeflag, false);

/* Attach the private data to this PCI device. */
@@ -3434,8 +3433,8 @@ iomap_err:
pci_set_drvdata(pdev, NULL);
return rv;
done:
- /* Set the atomic variable as 0 in case of SRSI */
- atomic_set(&dd->drv_cleanup_done, true);
+ /* Set the atomic variable as 0 */
+ atomic_set(&dd->drv_cleanup_done, false);

return rv;
}
@@ -3463,8 +3462,6 @@ static void mtip_pci_remove(struct pci_d
}
}
}
- /* Set the atomic variable as 1 in case of SRSI */
- atomic_set(&dd->drv_cleanup_done, true);

/* Clean up the block layer. */
mtip_block_remove(dd);

2012-05-10 17:41:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 35/52] KVM: s390: Sanitize fpc registers for KVM_SET_FPU

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Christian Borntraeger <[email protected]>

(cherry picked from commit 851755871c1f3184f4124c466e85881f17fa3226)

commit 7eef87dc99e419b1cc051e4417c37e4744d7b661 (KVM: s390: fix
register setting) added a load of the floating point control register
to the KVM_SET_FPU path. Lets make sure that the fpc is valid.

Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/kvm/kvm-s390.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -418,7 +418,7 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct
int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
{
memcpy(&vcpu->arch.guest_fpregs.fprs, &fpu->fprs, sizeof(fpu->fprs));
- vcpu->arch.guest_fpregs.fpc = fpu->fpc;
+ vcpu->arch.guest_fpregs.fpc = fpu->fpc & FPC_VALID_MASK;
restore_fp_regs(&vcpu->arch.guest_fpregs);
return 0;
}

2012-05-10 17:36:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 39/52] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Avi Kivity <[email protected]>

(cherry picked from commit 3e515705a1f46beb1c942bb8043c16f8ac7b1e9e)

If some vcpus are created before KVM_CREATE_IRQCHIP, then
irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading
to potential NULL pointer dereferences.

Fix by:
- ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called
- ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP

This is somewhat long winded because vcpu->arch.apic is created without
kvm->lock held.

Based on earlier patch by Michael Ellerman.

Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/ia64/kvm/kvm-ia64.c | 5 +++++
arch/x86/kvm/x86.c | 8 ++++++++
include/linux/kvm_host.h | 7 +++++++
virt/kvm/kvm_main.c | 4 ++++
4 files changed, 24 insertions(+)

--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1169,6 +1169,11 @@ out:

#define PALE_RESET_ENTRY 0x80000000ffffffb0UL

+bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
+{
+ return irqchip_in_kernel(vcpu->kcm) == (vcpu->arch.apic != NULL);
+}
+
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
struct kvm_vcpu *v;
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3131,6 +3131,9 @@ long kvm_arch_vm_ioctl(struct file *filp
r = -EEXIST;
if (kvm->arch.vpic)
goto create_irqchip_unlock;
+ r = -EINVAL;
+ if (atomic_read(&kvm->online_vcpus))
+ goto create_irqchip_unlock;
r = -ENOMEM;
vpic = kvm_create_pic(kvm);
if (vpic) {
@@ -5956,6 +5959,11 @@ void kvm_arch_check_processor_compat(voi
kvm_x86_ops->check_processor_compatibility(rtn);
}

+bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
+{
+ return irqchip_in_kernel(vcpu->kvm) == (vcpu->arch.apic != NULL);
+}
+
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
struct page *page;
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -775,6 +775,13 @@ static inline bool kvm_vcpu_is_bsp(struc
{
return vcpu->kvm->bsp_vcpu_id == vcpu->vcpu_id;
}
+
+bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu);
+
+#else
+
+static inline bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu) { return true; }
+
#endif

#ifdef __KVM_HAVE_DEVICE_ASSIGNMENT
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1720,6 +1720,10 @@ static int kvm_vm_ioctl_create_vcpu(stru
goto vcpu_destroy;

mutex_lock(&kvm->lock);
+ if (!kvm_vcpu_compatible(vcpu)) {
+ r = -EINVAL;
+ goto unlock_vcpu_destroy;
+ }
if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) {
r = -EINVAL;
goto unlock_vcpu_destroy;

2012-05-10 17:42:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 41/52] KVM: nVMX: Fix erroneous exception bitmap check

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Nadav Har'El <[email protected]>

(cherry picked from commit 9587190107d0c0cbaccbf7bf6b0245d29095a9ae)

The code which checks whether to inject a pagefault to L1 or L2 (in
nested VMX) was wrong, incorrect in how it checked the PF_VECTOR bit.
Thanks to Dan Carpenter for spotting this.

Signed-off-by: Nadav Har'El <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kvm/vmx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1678,7 +1678,7 @@ static int nested_pf_handled(struct kvm_
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);

/* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
- if (!(vmcs12->exception_bitmap & PF_VECTOR))
+ if (!(vmcs12->exception_bitmap & (1u << PF_VECTOR)))
return 0;

nested_vmx_vmexit(vcpu);

2012-05-10 17:42:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 40/52] KVM: VMX: Fix delayed load of shared MSRs

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Avi Kivity <[email protected]>

(cherry picked from commit 9ee73970c03edb68146ceb1ba2a7033c99a5e017)

Shared MSRs (MSR_*STAR and related) are stored in both vmx->guest_msrs
and in the CPU registers, but vmx_set_msr() only updated memory. Prior
to 46199f33c2953, this didn't matter, since we called vmx_load_host_state(),
which scheduled a vmx_save_host_state(), which re-synchronized the CPU
state, but now we don't, so the CPU state will not be synchronized until
the next exit to host userspace. This mostly affects nested vmx workloads,
which play with these MSRs a lot.

Fix by loading the MSR eagerly.

Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kvm/vmx.c | 3 +++
1 file changed, 3 insertions(+)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2219,6 +2219,9 @@ static int vmx_set_msr(struct kvm_vcpu *
msr = find_msr_entry(vmx, msr_index);
if (msr) {
msr->data = data;
+ if (msr - vmx->guest_msrs < vmx->save_nmsrs)
+ kvm_set_shared_msr(msr->index, msr->data,
+ msr->mask);
break;
}
ret = kvm_set_msr_common(vcpu, msr_index, data);

2012-05-10 17:42:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 38/52] KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Gleb Natapov <[email protected]>

(cherry picked from commit 270c6c79f4e15e599f47174ecedad932463af7a2)


Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kvm/pmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -413,7 +413,7 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vc
struct kvm_pmc *counters;
u64 ctr;

- pmc &= (3u << 30) - 1;
+ pmc &= ~(3u << 30);
if (!fixed && pmc >= pmu->nr_arch_gp_counters)
return 1;
if (fixed && pmc >= pmu->nr_arch_fixed_counters)

2012-05-10 17:36:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 34/52] KVM: s390: do store status after handling STOP_ON_STOP bit

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Jens Freimann <[email protected]>

(cherry picked from commit 9e0d5473e2f0ba2d2fe9dab9408edef3060b710e)

In handle_stop() handle the stop bit before doing the store status as
described for "Stop and Store Status" in the Principles of Operation.
We have to give up the local_int.lock before calling kvm store status
since it calls gmap_fault() which might sleep. Since local_int.lock
only protects local_int.* and not guest memory we can give up the lock.

Signed-off-by: Jens Freimann <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/s390/kvm/intercept.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -133,13 +133,6 @@ static int handle_stop(struct kvm_vcpu *

vcpu->stat.exit_stop_request++;
spin_lock_bh(&vcpu->arch.local_int.lock);
- if (vcpu->arch.local_int.action_bits & ACTION_STORE_ON_STOP) {
- vcpu->arch.local_int.action_bits &= ~ACTION_STORE_ON_STOP;
- rc = kvm_s390_vcpu_store_status(vcpu,
- KVM_S390_STORE_STATUS_NOADDR);
- if (rc >= 0)
- rc = -EOPNOTSUPP;
- }

if (vcpu->arch.local_int.action_bits & ACTION_RELOADVCPU_ON_STOP) {
vcpu->arch.local_int.action_bits &= ~ACTION_RELOADVCPU_ON_STOP;
@@ -155,7 +148,18 @@ static int handle_stop(struct kvm_vcpu *
rc = -EOPNOTSUPP;
}

- spin_unlock_bh(&vcpu->arch.local_int.lock);
+ if (vcpu->arch.local_int.action_bits & ACTION_STORE_ON_STOP) {
+ vcpu->arch.local_int.action_bits &= ~ACTION_STORE_ON_STOP;
+ /* store status must be called unlocked. Since local_int.lock
+ * only protects local_int.* and not guest memory we can give
+ * up the lock here */
+ spin_unlock_bh(&vcpu->arch.local_int.lock);
+ rc = kvm_s390_vcpu_store_status(vcpu,
+ KVM_S390_STORE_STATUS_NOADDR);
+ if (rc >= 0)
+ rc = -EOPNOTSUPP;
+ } else
+ spin_unlock_bh(&vcpu->arch.local_int.lock);
return rc;
}


2012-05-10 17:44:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 36/52] KVM: Fix write protection race during dirty logging

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Takuya Yoshikawa <[email protected]>

(cherry picked from commit 6dbf79e7164e9a86c1e466062c48498142ae6128)

This patch fixes a race introduced by:

commit 95d4c16ce78cb6b7549a09159c409d52ddd18dae
KVM: Optimize dirty logging by rmap_write_protect()

During protecting pages for dirty logging, other threads may also try
to protect a page in mmu_sync_children() or kvm_mmu_get_page().

In such a case, because get_dirty_log releases mmu_lock before flushing
TLB's, the following race condition can happen:

A (get_dirty_log) B (another thread)

lock(mmu_lock)
clear pte.w
unlock(mmu_lock)
lock(mmu_lock)
pte.w is already cleared
unlock(mmu_lock)
skip TLB flush
return
...
TLB flush

Though thread B assumes the page has already been protected when it
returns, the remaining TLB entry will break that assumption.

This patch fixes this problem by making get_dirty_log hold the mmu_lock
until it flushes the TLB's.

Signed-off-by: Takuya Yoshikawa <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kvm/x86.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2997,6 +2997,8 @@ static void write_protect_slot(struct kv
unsigned long *dirty_bitmap,
unsigned long nr_dirty_pages)
{
+ spin_lock(&kvm->mmu_lock);
+
/* Not many dirty pages compared to # of shadow pages. */
if (nr_dirty_pages < kvm->arch.n_used_mmu_pages) {
unsigned long gfn_offset;
@@ -3004,16 +3006,13 @@ static void write_protect_slot(struct kv
for_each_set_bit(gfn_offset, dirty_bitmap, memslot->npages) {
unsigned long gfn = memslot->base_gfn + gfn_offset;

- spin_lock(&kvm->mmu_lock);
kvm_mmu_rmap_write_protect(kvm, gfn, memslot);
- spin_unlock(&kvm->mmu_lock);
}
kvm_flush_remote_tlbs(kvm);
- } else {
- spin_lock(&kvm->mmu_lock);
+ } else
kvm_mmu_slot_remove_write_access(kvm, memslot->id);
- spin_unlock(&kvm->mmu_lock);
- }
+
+ spin_unlock(&kvm->mmu_lock);
}

/*

2012-05-10 17:44:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 29/52] tg3: Avoid panic from reserved statblk field access

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Matt Carlson <[email protected]>

[ Upstream commit f891ea1634ce41f5f47ae40d8594809f4cd2ca66 ]

When RSS is enabled, interrupt vector 0 does not receive any rx traffic.
The rx producer index fields for vector 0's status block should be
considered reserved in this case. This patch changes the code to
respect these reserved fields, which avoids a kernel panic when these
fields take on non-zero values.

Signed-off-by: Matt Carlson <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/broadcom/tg3.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -879,8 +879,13 @@ static inline unsigned int tg3_has_work(
if (sblk->status & SD_STATUS_LINK_CHG)
work_exists = 1;
}
- /* check for RX/TX work to do */
- if (sblk->idx[0].tx_consumer != tnapi->tx_cons ||
+
+ /* check for TX work to do */
+ if (sblk->idx[0].tx_consumer != tnapi->tx_cons)
+ work_exists = 1;
+
+ /* check for RX work to do */
+ if (tnapi->rx_rcb_prod_idx &&
*(tnapi->rx_rcb_prod_idx) != tnapi->rx_rcb_ptr)
work_exists = 1;

@@ -5877,6 +5882,9 @@ static int tg3_poll_work(struct tg3_napi
return work_done;
}

+ if (!tnapi->rx_rcb_prod_idx)
+ return work_done;
+
/* run RX thread, within the bounds set by NAPI.
* All RX "locking" is done by ensuring outside
* code synchronizes with tg3->napi.poll()
@@ -7428,6 +7436,12 @@ static int tg3_alloc_consistent(struct t
*/
switch (i) {
default:
+ if (tg3_flag(tp, ENABLE_RSS)) {
+ tnapi->rx_rcb_prod_idx = NULL;
+ break;
+ }
+ /* Fall through */
+ case 1:
tnapi->rx_rcb_prod_idx = &sblk->idx[0].rx_producer;
break;
case 2:

2012-05-10 17:44:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 18/52] ARM: 7414/1: SMP: prevent use of the console when using idmap_pgd

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Colin Cross <[email protected]>

commit fde165b2a29673aabf18ceff14dea1f1cfb0daad upstream.

Commit 4e8ee7de227e3ab9a72040b448ad728c5428a042 (ARM: SMP: use
idmap_pgd for mapping MMU enable during secondary booting)
switched secondary boot to use idmap_pgd, which is initialized
during early_initcall, instead of a page table initialized during
__cpu_up. This causes idmap_pgd to contain the static mappings
but be missing all dynamic mappings.

If a console is registered that creates a dynamic mapping, the
printk in secondary_start_kernel will trigger a data abort on
the missing mapping before the exception handlers have been
initialized, leading to a hang. Initial boot is not affected
because no consoles have been registered, and resume is usually
not affected because the offending console is suspended.
Onlining a cpu with hotplug triggers the problem.

A workaround is to the printk in secondary_start_kernel until
after the page tables have been switched back to init_mm.

Signed-off-by: Colin Cross <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/kernel/smp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -255,8 +255,6 @@ asmlinkage void __cpuinit secondary_star
struct mm_struct *mm = &init_mm;
unsigned int cpu = smp_processor_id();

- printk("CPU%u: Booted secondary processor\n", cpu);
-
/*
* All kernel threads share the same mm context; grab a
* reference and switch to it.
@@ -268,6 +266,8 @@ asmlinkage void __cpuinit secondary_star
enter_lazy_tlb(mm, current);
local_flush_tlb_all();

+ printk("CPU%u: Booted secondary processor\n", cpu);
+
cpu_init();
preempt_disable();
trace_hardirqs_off();

2012-05-10 17:34:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 25/52] net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Sasha Levin <[email protected]>

[ Upstream commit 84768edbb2721637620b2d84501bb0d5aed603f1 ]

l2tp_ip_sendmsg could return without releasing socket lock, making it all the
way to userspace, and generating the following warning:

[ 130.891594] ================================================
[ 130.894569] [ BUG: lock held when returning to user space! ]
[ 130.897257] 3.4.0-rc5-next-20120501-sasha #104 Tainted: G W
[ 130.900336] ------------------------------------------------
[ 130.902996] trinity/8384 is leaving the kernel with locks still held!
[ 130.906106] 1 lock held by trinity/8384:
[ 130.907924] #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff82b9503f>] l2tp_ip_sendmsg+0x2f/0x550

Introduced by commit 2f16270 ("l2tp: Fix locking in l2tp_ip.c").

Signed-off-by: Sasha Levin <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/l2tp/l2tp_ip.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -441,8 +441,9 @@ static int l2tp_ip_sendmsg(struct kiocb

daddr = lip->l2tp_addr.s_addr;
} else {
+ rc = -EDESTADDRREQ;
if (sk->sk_state != TCP_ESTABLISHED)
- return -EDESTADDRREQ;
+ goto out;

daddr = inet->inet_daddr;
connected = 1;

2012-05-10 17:45:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 26/52] sky2: propogate rx hash when packet is copied

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: stephen hemminger <[email protected]>

[ Upstream commit 3f42941b5d1d13542b1a755a9e4f633aa72e4d3e ]

When a small packet is received, the driver copies it to a new skb to allow
reusing the full size Rx buffer. The copy was propogating the checksum offload
but not the receive hash information. The bug is impact was mostly harmless
and therefore not observed until reviewing this area of code.

Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/marvell/sky2.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -2483,8 +2483,11 @@ static struct sk_buff *receive_copy(stru
skb_copy_from_linear_data(re->skb, skb->data, length);
skb->ip_summed = re->skb->ip_summed;
skb->csum = re->skb->csum;
+ skb->rxhash = re->skb->rxhash;
+
pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
length, PCI_DMA_FROMDEVICE);
+ re->skb->rxhash = 0;
re->skb->ip_summed = CHECKSUM_NONE;
skb_put(skb, length);
}

2012-05-10 17:46:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 22/52] asix: Fix tx transfer padding for full-speed USB

3.3-stable review patch. If anyone has any objections, please let me know.

------------------


From: Ingo van Lil <[email protected]>

[ Upstream commit 2a5809499e35b53a6044fd34e72b242688b7a862 ]

The asix.c USB Ethernet driver avoids ending a tx transfer with a zero-
length packet by appending a four-byte padding to transfers whose length
is a multiple of maxpacket. However, the hard-coded 512 byte maxpacket
length is valid for high-speed USB only; full-speed USB uses 64 byte
packets.

Signed-off-by: Ingo van Lil <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/usb/asix.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -403,7 +403,7 @@ static struct sk_buff *asix_tx_fixup(str
u32 packet_len;
u32 padbytes = 0xffff0000;

- padlen = ((skb->len + 4) % 512) ? 0 : 4;
+ padlen = ((skb->len + 4) & (dev->maxpacket - 1)) ? 0 : 4;

if ((!skb_cloned(skb)) &&
((headroom + tailroom) >= (4 + padlen))) {
@@ -425,7 +425,7 @@ static struct sk_buff *asix_tx_fixup(str
cpu_to_le32s(&packet_len);
skb_copy_to_linear_data(skb, &packet_len, sizeof(packet_len));

- if ((skb->len % 512) == 0) {
+ if (padlen) {
cpu_to_le32s(&padbytes);
memcpy(skb_tail_pointer(skb), &padbytes, sizeof(padbytes));
skb_put(skb, sizeof(padbytes));

2012-05-10 17:46:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 20/52] ARM: orion5x: Fix GPIO enable bits for MPP9

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <[email protected]>

commit 48d99f47a81a66bdd61a348c7fe8df5a7afdf5f3 upstream.

Commit 554cdaefd1cf7bb54b209c4e68c7cec87ce442a9 ('ARM: orion5x: Refactor
mpp code to use common orion platform mpp.') seems to have accidentally
inverted the GPIO valid bits for MPP9 (only). For the mv2120 platform
which uses MPP9 as a GPIO LED device, this results in the error:

[ 12.711476] leds-gpio: probe of leds-gpio failed with error -22

Reported-by: Henry von Tresckow <[email protected]>
References: http://bugs.debian.org/667446
Signed-off-by: Ben Hutchings <[email protected]>
Tested-by: Hans Henry von Tresckow <[email protected]>
Signed-off-by: Jason Cooper <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/mach-orion5x/mpp.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm/mach-orion5x/mpp.h
+++ b/arch/arm/mach-orion5x/mpp.h
@@ -65,8 +65,8 @@
#define MPP8_GIGE MPP(8, 0x1, 0, 0, 1, 1, 1)

#define MPP9_UNUSED MPP(9, 0x0, 0, 0, 1, 1, 1)
-#define MPP9_GPIO MPP(9, 0x0, 0, 0, 1, 1, 1)
-#define MPP9_GIGE MPP(9, 0x1, 1, 1, 1, 1, 1)
+#define MPP9_GPIO MPP(9, 0x0, 1, 1, 1, 1, 1)
+#define MPP9_GIGE MPP(9, 0x1, 0, 0, 1, 1, 1)

#define MPP10_UNUSED MPP(10, 0x0, 0, 0, 1, 1, 1)
#define MPP10_GPIO MPP(10, 0x0, 1, 1, 1, 1, 1)

2012-05-10 17:34:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 17/52] ARM: 7412/1: audit: use only AUDIT_ARCH_ARM regardless of endianness

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Will Deacon <[email protected]>

commit 2f978366984a418f38fcf44137be1fbc5a89cfd9 upstream.

The machine endianness has no direct correspondence to the syscall ABI,
so use only AUDIT_ARCH_ARM when identifying the ABI to the audit tools
in userspace.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/kernel/ptrace.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -905,12 +905,6 @@ long arch_ptrace(struct task_struct *chi
return ret;
}

-#ifdef __ARMEB__
-#define AUDIT_ARCH_NR AUDIT_ARCH_ARMEB
-#else
-#define AUDIT_ARCH_NR AUDIT_ARCH_ARM
-#endif
-
asmlinkage int syscall_trace(int why, struct pt_regs *regs, int scno)
{
unsigned long ip;
@@ -918,7 +912,7 @@ asmlinkage int syscall_trace(int why, st
if (why)
audit_syscall_exit(regs);
else
- audit_syscall_entry(AUDIT_ARCH_NR, scno, regs->ARM_r0,
+ audit_syscall_entry(AUDIT_ARCH_ARM, scno, regs->ARM_r0,
regs->ARM_r1, regs->ARM_r2, regs->ARM_r3);

if (!test_thread_flag(TIF_SYSCALL_TRACE))

2012-05-10 17:47:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 09/52] ASoC: tlv312aic23: unbreak resume

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Bénard <[email protected]>

commit e875c1e3e758447ba81ca450d89434b3b0496d37 upstream.

* commit f9dfbf9 "ASoC: tlv320aic23: convert to soc-cache" leads to
a bug preventing resumeof the codec as regmap expects a 9 bits data
register but 0xFFFF is passed in tlv320aic23_set_bias_level and this
values gets cached preventing any write to the TLV320AIC23_PWR
register as the final value produced by regmap is (register << 9) | value

* this patch solves the problem by only working on the 9 bits the
register contains.

Signed-off-by: Eric Bénard <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/soc/codecs/tlv320aic23.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/sound/soc/codecs/tlv320aic23.c
+++ b/sound/soc/codecs/tlv320aic23.c
@@ -472,7 +472,7 @@ static int tlv320aic23_set_dai_sysclk(st
static int tlv320aic23_set_bias_level(struct snd_soc_codec *codec,
enum snd_soc_bias_level level)
{
- u16 reg = snd_soc_read(codec, TLV320AIC23_PWR) & 0xff7f;
+ u16 reg = snd_soc_read(codec, TLV320AIC23_PWR) & 0x17f;

switch (level) {
case SND_SOC_BIAS_ON:
@@ -491,7 +491,7 @@ static int tlv320aic23_set_bias_level(st
case SND_SOC_BIAS_OFF:
/* everything off, dac mute, inactive */
snd_soc_write(codec, TLV320AIC23_ACTIVE, 0x0);
- snd_soc_write(codec, TLV320AIC23_PWR, 0xffff);
+ snd_soc_write(codec, TLV320AIC23_PWR, 0x1ff);
break;
}
codec->dapm.bias_level = level;

2012-05-10 17:47:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 16/52] ARM: 7411/1: audit: fix treatment of saved ip register during syscall tracing

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Will Deacon <[email protected]>

commit 6a68b6f574c8ad2c1d90f0db8fd95b8abe8a0a73 upstream.

The ARM audit code incorrectly uses the saved application ip register
value to infer syscall entry or exit. Additionally, the saved value will
be clobbered if the current task is not being traced, which can lead to
libc corruption if ip is live (apparently glibc uses it for the TLS
pointer).

This patch fixes the syscall tracing code so that the why parameter is
used to infer the syscall direction and the saved ip is only updated if
we know that we will be signalling a ptrace trap.

Reported-and-Tested-by: Jon Masters <[email protected]>

Cc: Eric Paris <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/kernel/ptrace.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -915,14 +915,7 @@ asmlinkage int syscall_trace(int why, st
{
unsigned long ip;

- /*
- * Save IP. IP is used to denote syscall entry/exit:
- * IP = 0 -> entry, = 1 -> exit
- */
- ip = regs->ARM_ip;
- regs->ARM_ip = why;
-
- if (!ip)
+ if (why)
audit_syscall_exit(regs);
else
audit_syscall_entry(AUDIT_ARCH_NR, scno, regs->ARM_r0,
@@ -935,6 +928,13 @@ asmlinkage int syscall_trace(int why, st

current_thread_info()->syscall = scno;

+ /*
+ * IP is used to denote syscall entry/exit:
+ * IP = 0 -> entry, =1 -> exit
+ */
+ ip = regs->ARM_ip;
+ regs->ARM_ip = why;
+
/* the 0x80 provides a way for the tracing parent to distinguish
between a syscall stop and SIGTRAP delivery */
ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD)

2012-05-10 17:34:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 13/52] asm-generic: Use __BITS_PER_LONG in statfs.h

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: "H. Peter Anvin" <[email protected]>

commit f5c2347ee20a8d6964d6a6b1ad04f200f8d4dfa7 upstream.

<asm-generic/statfs.h> is exported to userspace, so using
BITS_PER_LONG is invalid. We need to use __BITS_PER_LONG instead.

This is kernel bugzilla 43165.

Reported-by: H.J. Lu <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
include/asm-generic/statfs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/asm-generic/statfs.h
+++ b/include/asm-generic/statfs.h
@@ -15,7 +15,7 @@ typedef __kernel_fsid_t fsid_t;
* with a 10' pole.
*/
#ifndef __statfs_word
-#if BITS_PER_LONG == 64
+#if __BITS_PER_LONG == 64
#define __statfs_word long
#else
#define __statfs_word __u32

2012-05-10 17:47:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 15/52] ARM: 7410/1: Add extra clobber registers for assembly in kernel_execve

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tim Bird <[email protected]>

commit e787ec1376e862fcea1bfd523feb7c5fb43ecdb9 upstream.

The inline assembly in kernel_execve() uses r8 and r9. Since this
code sequence does not return, it usually doesn't matter if the
register clobber list is accurate. However, I saw a case where a
particular version of gcc used r8 as an intermediate for the value
eventually passed to r9. Because r8 is used in the inline
assembly, and not mentioned in the clobber list, r9 was set
to an incorrect value.

This resulted in a kernel panic on execution of the first user-space
program in the system. r9 is used in ret_to_user as the thread_info
pointer, and if it's wrong, bad things happen.

Signed-off-by: Tim Bird <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/kernel/sys_arm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm/kernel/sys_arm.c
+++ b/arch/arm/kernel/sys_arm.c
@@ -115,7 +115,7 @@ int kernel_execve(const char *filename,
"Ir" (THREAD_START_SP - sizeof(regs)),
"r" (&regs),
"Ir" (sizeof(regs))
- : "r0", "r1", "r2", "r3", "ip", "lr", "memory");
+ : "r0", "r1", "r2", "r3", "r8", "r9", "ip", "lr", "memory");

out:
return ret;

2012-05-10 17:48:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 12/52] percpu, x86: dont use PMD_SIZE as embedded atom_size on 32bit

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tejun Heo <[email protected]>

commit d5e28005a1d2e67833852f4c9ea8ec206ea3ff85 upstream.

With the embed percpu first chunk allocator, x86 uses either PAGE_SIZE
or PMD_SIZE for atom_size. PMD_SIZE is used when CPU supports PSE so
that percpu areas are aligned to PMD mappings and possibly allow using
PMD mappings in vmalloc areas in the future. Using larger atom_size
doesn't waste actual memory; however, it does require larger vmalloc
space allocation later on for !first chunks.

With reasonably sized vmalloc area, PMD_SIZE shouldn't be a problem
but x86_32 at this point is anything but reasonable in terms of
address space and using larger atom_size reportedly leads to frequent
percpu allocation failures on certain setups.

As there is no reason to not use PMD_SIZE on x86_64 as vmalloc space
is aplenty and most x86_64 configurations support PSE, fix the issue
by always using PMD_SIZE on x86_64 and PAGE_SIZE on x86_32.

v2: drop cpu_has_pse test and make x86_64 always use PMD_SIZE and
x86_32 PAGE_SIZE as suggested by hpa.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Yanmin Zhang <[email protected]>
Reported-by: ShuoX Liu <[email protected]>
Acked-by: H. Peter Anvin <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/setup_percpu.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -185,10 +185,22 @@ void __init setup_per_cpu_areas(void)
#endif
rc = -EINVAL;
if (pcpu_chosen_fc != PCPU_FC_PAGE) {
- const size_t atom_size = cpu_has_pse ? PMD_SIZE : PAGE_SIZE;
const size_t dyn_size = PERCPU_MODULE_RESERVE +
PERCPU_DYNAMIC_RESERVE - PERCPU_FIRST_CHUNK_RESERVE;
+ size_t atom_size;

+ /*
+ * On 64bit, use PMD_SIZE for atom_size so that embedded
+ * percpu areas are aligned to PMD. This, in the future,
+ * can also allow using PMD mappings in vmalloc area. Use
+ * PAGE_SIZE on 32bit as vmalloc space is highly contended
+ * and large vmalloc area allocs can easily fail.
+ */
+#ifdef CONFIG_X86_64
+ atom_size = PMD_SIZE;
+#else
+ atom_size = PAGE_SIZE;
+#endif
rc = pcpu_embed_first_chunk(PERCPU_FIRST_CHUNK_RESERVE,
dyn_size, atom_size,
pcpu_cpu_distance,

2012-05-10 17:48:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 11/52] x86, relocs: Remove an unused variable

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kusanagi Kouichi <[email protected]>

commit 7c77cda0fe742ed07622827ce80963bbeebd1e3f upstream.

sh_symtab is set but not used.

[ hpa: putting this in urgent because of the sheer harmlessness of the patch:
it quiets a build warning but does not change any generated code. ]

Signed-off-by: Kusanagi Kouichi <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/boot/compressed/relocs.c | 2 --
1 file changed, 2 deletions(-)

--- a/arch/x86/boot/compressed/relocs.c
+++ b/arch/x86/boot/compressed/relocs.c
@@ -402,13 +402,11 @@ static void print_absolute_symbols(void)
for (i = 0; i < ehdr.e_shnum; i++) {
struct section *sec = &secs[i];
char *sym_strtab;
- Elf32_Sym *sh_symtab;
int j;

if (sec->shdr.sh_type != SHT_SYMTAB) {
continue;
}
- sh_symtab = sec->symtab;
sym_strtab = sec->link->strtab;
for (j = 0; j < sec->shdr.sh_size/sizeof(Elf32_Sym); j++) {
Elf32_Sym *sym;

2012-05-10 17:33:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 08/52] ASoC: core: check of_property_count_strings failure

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Richard Zhao <[email protected]>

commit c34ce320d9fe328e3272def20b152f39ccfa045e upstream.

Signed-off-by: Richard Zhao <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
sound/soc/soc-core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -3420,10 +3420,10 @@ int snd_soc_of_parse_audio_routing(struc
int i, ret;

num_routes = of_property_count_strings(np, propname);
- if (num_routes & 1) {
+ if (num_routes < 0 || num_routes & 1) {
dev_err(card->dev,
- "Property '%s's length is not even\n",
- propname);
+ "Property '%s' does not exist or its length is not even\n",
+ propname);
return -EINVAL;
}
num_routes /= 2;

2012-05-10 17:49:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 10/52] fs/cifs: fix parsing of dfs referrals

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Stefan Metzmacher <[email protected]>

commit d8f2799b105a24bb0bbd3380a0d56e6348484058 upstream.

The problem was that the first referral was parsed more than once
and so the caller tried the same referrals multiple times.

The problem was introduced partly by commit
066ce6899484d9026acd6ba3a8dbbedb33d7ae1b,
where 'ref += le16_to_cpu(ref->Size);' got lost,
but that was also wrong...

Signed-off-by: Stefan Metzmacher <[email protected]>
Tested-by: Björn Jacke <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/cifs/cifssmb.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -4831,8 +4831,12 @@ parse_DFS_referrals(TRANSACTION2_GET_DFS
max_len = data_end - temp;
node->node_name = cifs_strndup_from_utf16(temp, max_len,
is_unicode, nls_codepage);
- if (!node->node_name)
+ if (!node->node_name) {
rc = -ENOMEM;
+ goto parse_DFS_referrals_exit;
+ }
+
+ ref++;
}

parse_DFS_referrals_exit:

2012-05-10 17:49:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 06/52] drm/i915: disable sdvo hotplug on i945g/gm

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Vetter <[email protected]>

commit 768b107e4b3be0acf6f58e914afe4f337c00932b upstream.

Chris Wilson dug out a hw erratum saying that there's noise on the
interrupt line on i945G chips. We also have a bug report from a i945GM
chip with an sdvo hotplug interrupt storm (and no apparent cause).

Play it safe and disable sdvo hotplug on all i945 variants.

Note that this is a regression that has been introduced in 3.1,
when we've enabled sdvo hotplug support with

commit cc68c81aed7d892deaf12d720d5455208e94cd0a
Author: Simon Farnsworth <[email protected]>
Date: Wed Sep 21 17:13:30 2011 +0100

drm/i915: Enable SDVO hotplug interrupts for HDMI and DVI

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=38442
Reported-and-tested-by: Dominik Köppl <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-Off-by: Daniel Vetter <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/i915/intel_sdvo.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/drivers/gpu/drm/i915/intel_sdvo.c
+++ b/drivers/gpu/drm/i915/intel_sdvo.c
@@ -1221,8 +1221,14 @@ static bool intel_sdvo_get_capabilities(

static int intel_sdvo_supports_hotplug(struct intel_sdvo *intel_sdvo)
{
+ struct drm_device *dev = intel_sdvo->base.base.dev;
u8 response[2];

+ /* HW Erratum: SDVO Hotplug is broken on all i945G chips, there's noise
+ * on the line. */
+ if (IS_I945G(dev) || IS_I945GM(dev))
+ return false;
+
return intel_sdvo_get_value(intel_sdvo, SDVO_CMD_GET_HOT_PLUG_SUPPORT,
&response, 2) && response[0];
}

2012-05-10 17:50:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 03/52] e1000: fix vlan processing regression

3.3-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jiri Pirko <[email protected]>

commit 52f5509fe8ccb607ff9b84ad618f244262336475 upstream.

This patch fixes a regression introduced by commit "e1000: do vlan
cleanup (799d531)".

Apparently some e1000 chips (not mine) are sensitive about the order of
setting vlan filter and vlan stripping/inserting functionality. So this
patch changes the order so it's the same as before vlan cleanup.

Reported-by: Ben Greear <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Tested-by: Ben Greear <[email protected]>
Tested-by: Aaron Brown <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Cc: David Ward <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/ethernet/intel/e1000/e1000_main.c | 35 ++++++++++++++++----------
1 file changed, 22 insertions(+), 13 deletions(-)

--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -164,6 +164,8 @@ static int e1000_82547_fifo_workaround(s
static bool e1000_vlan_used(struct e1000_adapter *adapter);
static void e1000_vlan_mode(struct net_device *netdev,
netdev_features_t features);
+static void e1000_vlan_filter_on_off(struct e1000_adapter *adapter,
+ bool filter_on);
static int e1000_vlan_rx_add_vid(struct net_device *netdev, u16 vid);
static int e1000_vlan_rx_kill_vid(struct net_device *netdev, u16 vid);
static void e1000_restore_vlan(struct e1000_adapter *adapter);
@@ -1213,7 +1215,7 @@ static int __devinit e1000_probe(struct
if (err)
goto err_register;

- e1000_vlan_mode(netdev, netdev->features);
+ e1000_vlan_filter_on_off(adapter, false);

/* print bus type/speed/width info */
e_info(probe, "(PCI%s:%dMHz:%d-bit) %pM\n",
@@ -4549,6 +4551,22 @@ static bool e1000_vlan_used(struct e1000
return false;
}

+static void __e1000_vlan_mode(struct e1000_adapter *adapter,
+ netdev_features_t features)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ u32 ctrl;
+
+ ctrl = er32(CTRL);
+ if (features & NETIF_F_HW_VLAN_RX) {
+ /* enable VLAN tag insert/strip */
+ ctrl |= E1000_CTRL_VME;
+ } else {
+ /* disable VLAN tag insert/strip */
+ ctrl &= ~E1000_CTRL_VME;
+ }
+ ew32(CTRL, ctrl);
+}
static void e1000_vlan_filter_on_off(struct e1000_adapter *adapter,
bool filter_on)
{
@@ -4558,6 +4576,7 @@ static void e1000_vlan_filter_on_off(str
if (!test_bit(__E1000_DOWN, &adapter->flags))
e1000_irq_disable(adapter);

+ __e1000_vlan_mode(adapter, adapter->netdev->features);
if (filter_on) {
/* enable VLAN receive filtering */
rctl = er32(RCTL);
@@ -4578,24 +4597,14 @@ static void e1000_vlan_filter_on_off(str
}

static void e1000_vlan_mode(struct net_device *netdev,
- netdev_features_t features)
+ netdev_features_t features)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
- struct e1000_hw *hw = &adapter->hw;
- u32 ctrl;

if (!test_bit(__E1000_DOWN, &adapter->flags))
e1000_irq_disable(adapter);

- ctrl = er32(CTRL);
- if (features & NETIF_F_HW_VLAN_RX) {
- /* enable VLAN tag insert/strip */
- ctrl |= E1000_CTRL_VME;
- } else {
- /* disable VLAN tag insert/strip */
- ctrl &= ~E1000_CTRL_VME;
- }
- ew32(CTRL, ctrl);
+ __e1000_vlan_mode(adapter, features);

if (!test_bit(__E1000_DOWN, &adapter->flags))
e1000_irq_enable(adapter);

2012-05-10 18:25:37

by Mark Lord

[permalink] [raw]
Subject: Re: [ 22/52] asix: Fix tx transfer padding for full-speed USB

On 12-05-10 01:31 PM, Greg KH wrote:
> 3.3-stable review patch. If anyone has any objections, please let me know.
..
> From: Ingo van Lil <[email protected]>
> [ Upstream commit 2a5809499e35b53a6044fd34e72b242688b7a862 ]
>
> The asix.c USB Ethernet driver avoids ending a tx transfer with a zero-
> length packet by appending a four-byte padding to transfers whose length
> is a multiple of maxpacket. However, the hard-coded 512 byte maxpacket
> length is valid for high-speed USB only; full-speed USB uses 64 byte
> packets.
>
> Signed-off-by: Ingo van Lil <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> ---
> drivers/net/usb/asix.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/drivers/net/usb/asix.c
> +++ b/drivers/net/usb/asix.c
> @@ -403,7 +403,7 @@ static struct sk_buff *asix_tx_fixup(str
> u32 packet_len;
> u32 padbytes = 0xffff0000;
>
> - padlen = ((skb->len + 4) % 512) ? 0 : 4;
> + padlen = ((skb->len + 4) & (dev->maxpacket - 1)) ? 0 : 4;
>
> if ((!skb_cloned(skb)) &&
> ((headroom + tailroom) >= (4 + padlen))) {
> @@ -425,7 +425,7 @@ static struct sk_buff *asix_tx_fixup(str
> cpu_to_le32s(&packet_len);
> skb_copy_to_linear_data(skb, &packet_len, sizeof(packet_len));
>
> - if ((skb->len % 512) == 0) {
> + if (padlen) {
> cpu_to_le32s(&padbytes);
> memcpy(skb_tail_pointer(skb), &padbytes, sizeof(padbytes));
> skb_put(skb, sizeof(padbytes));
>
>

This patch changes behaviour even for high-speed USB.
Was this intentional, and why?

Thanks

2012-05-10 18:29:09

by Mark Lord

[permalink] [raw]
Subject: Re: [ 22/52] asix: Fix tx transfer padding for full-speed USB

On 12-05-10 02:25 PM, Mark Lord wrote:
> On 12-05-10 01:31 PM, Greg KH wrote:
>> 3.3-stable review patch. If anyone has any objections, please let me know.
> ..
>> From: Ingo van Lil <[email protected]>
>> [ Upstream commit 2a5809499e35b53a6044fd34e72b242688b7a862 ]
>>
>> The asix.c USB Ethernet driver avoids ending a tx transfer with a zero-
>> length packet by appending a four-byte padding to transfers whose length
>> is a multiple of maxpacket. However, the hard-coded 512 byte maxpacket
>> length is valid for high-speed USB only; full-speed USB uses 64 byte
>> packets.
>>
>> Signed-off-by: Ingo van Lil <[email protected]>
>> Signed-off-by: David S. Miller <[email protected]>
>> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>> ---
>> drivers/net/usb/asix.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> --- a/drivers/net/usb/asix.c
>> +++ b/drivers/net/usb/asix.c
>> @@ -403,7 +403,7 @@ static struct sk_buff *asix_tx_fixup(str
>> u32 packet_len;
>> u32 padbytes = 0xffff0000;
>>
>> - padlen = ((skb->len + 4) % 512) ? 0 : 4;
>> + padlen = ((skb->len + 4) & (dev->maxpacket - 1)) ? 0 : 4;
>>
>> if ((!skb_cloned(skb)) &&
>> ((headroom + tailroom) >= (4 + padlen))) {
>> @@ -425,7 +425,7 @@ static struct sk_buff *asix_tx_fixup(str
>> cpu_to_le32s(&packet_len);
>> skb_copy_to_linear_data(skb, &packet_len, sizeof(packet_len));
>>
>> - if ((skb->len % 512) == 0) {
>> + if (padlen) {
>> cpu_to_le32s(&padbytes);
>> memcpy(skb_tail_pointer(skb), &padbytes, sizeof(padbytes));
>> skb_put(skb, sizeof(padbytes));
>>
>>
>
> This patch changes behaviour even for high-speed USB.
> Was this intentional, and why?

Never mind.. I missed the skb_push(skb, 4) line
that the diff doesn't show above.

Okay by me.

2012-05-10 18:52:33

by David Miller

[permalink] [raw]
Subject: Re: [ 22/52] asix: Fix tx transfer padding for full-speed USB

From: Mark Lord <[email protected]>
Date: Thu, 10 May 2012 14:29:03 -0400

> Never mind.. I missed the skb_push(skb, 4) line
> that the diff doesn't show above.
>
> Okay by me.

Do me a favor and provide this kind of feedback when the patch
gets originally submitted, not some time later when it happens
to make it's way into a -stable submission.

2012-05-10 18:53:30

by Duyck, Alexander H

[permalink] [raw]
Subject: Re: [ 33/52] net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state

On 05/10/2012 10:32 AM, Greg KH wrote:
> 3.3-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
>
> From: Alexander Duyck <[email protected]>
>
> [ Upstream commit 5c4903549c05bbb373479e0ce2992573c120654a ]
>
> We are seeing dev_watchdog hangs on several drivers. I suspect this is due
> to the __QUEUE_STATE_STACK_XOFF bit being set prior to a reset for link
> change, and then not being cleared by netdev_tx_reset_queue. This change
> corrects that.
>
> In addition we were seeing dev_watchdog hangs on igb after running the
> ethtool tests. We found this to be due to the fact that the ethtool test
> runs the same logic as ndo_start_xmit, but we were never clearing the XOFF
> flag since the loopback test in ethtool does not do byte queue accounting.
>
> Signed-off-by: Alexander Duyck <[email protected]>
> Tested-by: Stephen Ko <[email protected]>
> Signed-off-by: Jeff Kirsher <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> ---
> drivers/net/ethernet/intel/igb/igb_main.c | 3 ++-
> include/linux/netdevice.h | 1 +
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -2750,6 +2750,8 @@ void igb_configure_tx_ring(struct igb_ad
>
> txdctl |= E1000_TXDCTL_QUEUE_ENABLE;
> wr32(E1000_TXDCTL(reg_idx), txdctl);
> +
> + netdev_tx_reset_queue(txring_txq(ring));
> }
>
> /**
> @@ -3242,7 +3244,6 @@ static void igb_clean_tx_ring(struct igb
> buffer_info = &tx_ring->tx_buffer_info[i];
> igb_unmap_and_free_tx_resource(tx_ring, buffer_info);
> }
> - netdev_tx_reset_queue(txring_txq(tx_ring));
>
> size = sizeof(struct igb_tx_buffer) * tx_ring->count;
> memset(tx_ring->tx_buffer_info, 0, size);
All of the changes in igb are unnecessary. I think they can be dropped
from the patch as they have already been reverted from the latest
net-next. The only change that is really needed is the piece in
netdevice.h below.
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1955,6 +1955,7 @@ static inline void netdev_completed_queu
> static inline void netdev_tx_reset_queue(struct netdev_queue *q)
> {
> #ifdef CONFIG_BQL
> + clear_bit(__QUEUE_STATE_STACK_XOFF, &q->state);
> dql_reset(&q->dql);
> #endif
> }
>
>
Thanks,

Alex

2012-05-10 18:55:23

by David Miller

[permalink] [raw]
Subject: Re: [ 33/52] net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state

From: Alexander Duyck <[email protected]>
Date: Thu, 10 May 2012 11:53:27 -0700

> All of the changes in igb are unnecessary. I think they can be dropped
> from the patch as they have already been reverted from the latest
> net-next. The only change that is really needed is the piece in
> netdevice.h below.

Does it hurt? If not, just let it be.

2012-05-10 19:47:16

by Jonathan Nieder

[permalink] [raw]
Subject: Re: [ 33/52] net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state

David Miller wrote:
> From: Alexander Duyck <[email protected]>

>> All of the changes in igb are unnecessary. I think they can be dropped
>> from the patch as they have already been reverted from the latest
>> net-next. The only change that is really needed is the piece in
>> netdevice.h below.
>
> Does it hurt? If not, just let it be.

Based on the commit description for dad8a3b3eaa0 ("igb, ixgbe:
netdev_tx_reset_queue incorrectly called from tx init path"), yes, it
hurts. I don't know if it hurts enough to justify holding up this
patch until dad8a3b3eaa0 hits mainline.

| this sort of works in most cases except
| when the number of real tx queues changes. When the number of real
| tx queues changes netdev_tx_reset_queue() only gets called on the
| new number of queues so when we reduce the number of queues we risk
| triggering the watchdog timer and repeated device resets.
|
| So this is not only a cosmetic issue but causes real bugs. For
| example enabling/disabling DCB or FCoE in ixgbe will trigger this.

2012-05-10 20:35:51

by Duyck, Alexander H

[permalink] [raw]
Subject: Re: [ 33/52] net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state

On 05/10/2012 12:46 PM, Jonathan Nieder wrote:
> David Miller wrote:
>> From: Alexander Duyck <[email protected]>
>>> All of the changes in igb are unnecessary. I think they can be dropped
>>> from the patch as they have already been reverted from the latest
>>> net-next. The only change that is really needed is the piece in
>>> netdevice.h below.
>> Does it hurt? If not, just let it be.
> Based on the commit description for dad8a3b3eaa0 ("igb, ixgbe:
> netdev_tx_reset_queue incorrectly called from tx init path"), yes, it
> hurts. I don't know if it hurts enough to justify holding up this
> patch until dad8a3b3eaa0 hits mainline.
>
> | this sort of works in most cases except
> | when the number of real tx queues changes. When the number of real
> | tx queues changes netdev_tx_reset_queue() only gets called on the
> | new number of queues so when we reduce the number of queues we risk
> | triggering the watchdog timer and repeated device resets.
> |
> | So this is not only a cosmetic issue but causes real bugs. For
> | example enabling/disabling DCB or FCoE in ixgbe will trigger this.
I'm fairly certain we really shouldn't make the changes to igb. I would
say to just modify the existing patch by just dropping the igb portion
of it and it would be fine.

Thanks,

Alex

2012-05-10 20:51:35

by David Miller

[permalink] [raw]
Subject: Re: [ 33/52] net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state

From: Alexander Duyck <[email protected]>
Date: Thu, 10 May 2012 13:35:49 -0700

> On 05/10/2012 12:46 PM, Jonathan Nieder wrote:
>> David Miller wrote:
>>> From: Alexander Duyck <[email protected]>
>>>> All of the changes in igb are unnecessary. I think they can be dropped
>>>> from the patch as they have already been reverted from the latest
>>>> net-next. The only change that is really needed is the piece in
>>>> netdevice.h below.
>>> Does it hurt? If not, just let it be.
>> Based on the commit description for dad8a3b3eaa0 ("igb, ixgbe:
>> netdev_tx_reset_queue incorrectly called from tx init path"), yes, it
>> hurts. I don't know if it hurts enough to justify holding up this
>> patch until dad8a3b3eaa0 hits mainline.
>>
>> | this sort of works in most cases except
>> | when the number of real tx queues changes. When the number of real
>> | tx queues changes netdev_tx_reset_queue() only gets called on the
>> | new number of queues so when we reduce the number of queues we risk
>> | triggering the watchdog timer and repeated device resets.
>> |
>> | So this is not only a cosmetic issue but causes real bugs. For
>> | example enabling/disabling DCB or FCoE in ixgbe will trigger this.
> I'm fairly certain we really shouldn't make the changes to igb. I would
> say to just modify the existing patch by just dropping the igb portion
> of it and it would be fine.

Ok, Greg, could you do me a huge favor and simply trim the IGB driver
changes from this patch and just keep the linux/netdevice.h part?

2012-05-10 22:03:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [ 33/52] net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state

On Thu, May 10, 2012 at 04:51:20PM -0400, David Miller wrote:
> From: Alexander Duyck <[email protected]>
> Date: Thu, 10 May 2012 13:35:49 -0700
>
> > On 05/10/2012 12:46 PM, Jonathan Nieder wrote:
> >> David Miller wrote:
> >>> From: Alexander Duyck <[email protected]>
> >>>> All of the changes in igb are unnecessary. I think they can be dropped
> >>>> from the patch as they have already been reverted from the latest
> >>>> net-next. The only change that is really needed is the piece in
> >>>> netdevice.h below.
> >>> Does it hurt? If not, just let it be.
> >> Based on the commit description for dad8a3b3eaa0 ("igb, ixgbe:
> >> netdev_tx_reset_queue incorrectly called from tx init path"), yes, it
> >> hurts. I don't know if it hurts enough to justify holding up this
> >> patch until dad8a3b3eaa0 hits mainline.
> >>
> >> | this sort of works in most cases except
> >> | when the number of real tx queues changes. When the number of real
> >> | tx queues changes netdev_tx_reset_queue() only gets called on the
> >> | new number of queues so when we reduce the number of queues we risk
> >> | triggering the watchdog timer and repeated device resets.
> >> |
> >> | So this is not only a cosmetic issue but causes real bugs. For
> >> | example enabling/disabling DCB or FCoE in ixgbe will trigger this.
> > I'm fairly certain we really shouldn't make the changes to igb. I would
> > say to just modify the existing patch by just dropping the igb portion
> > of it and it would be fine.
>
> Ok, Greg, could you do me a huge favor and simply trim the IGB driver
> changes from this patch and just keep the linux/netdevice.h part?

Now done, thanks for letting me know.

greg k-h