Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751789AbbBEVem (ORCPT ); Thu, 5 Feb 2015 16:34:42 -0500 Received: from mail-ig0-f181.google.com ([209.85.213.181]:49165 "EHLO mail-ig0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751123AbbBEVel (ORCPT ); Thu, 5 Feb 2015 16:34:41 -0500 MIME-Version: 1.0 In-Reply-To: <54D3DA75.70402@oracle.com> References: <54D2AA16.6030706@oracle.com> <1423169986.6835.24.camel@stgolabs.net> <54D3DA75.70402@oracle.com> Date: Thu, 5 Feb 2015 13:34:40 -0800 X-Google-Sender-Auth: YLMcEqGXeSfjicdDxGArfvVZiVY Message-ID: Subject: Re: sched: memory corruption on completing completions From: Linus Torvalds To: Sasha Levin Cc: Davidlohr Bueso , Waiman Long , Peter Zijlstra , Ingo Molnar , Andrew Morton , Andrey Ryabinin , Dave Jones , LKML , Raghavendra K T Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2553 Lines: 54 On Thu, Feb 5, 2015 at 1:02 PM, Sasha Levin wrote: > > Interestingly enough, according to that article this behaviour seems to be > "by design": Oh, it's definitely by design, it's just that the design looked at spinlocks without the admittedly very subtle issue of lifetime vs unlocking. Spinlocks (and completions) are special - for other locks we have basically allowed lifetimes to be separate from the lock state, and if you have a data structure with a mutex in it, you'll have to have some separate lifetime rule outside of the lock itself. But spinlocks and completions have their locking state tied into their lifetime. Completions are so very much by design (because dynamic completions on the stack is one of the core use cases), and spinlocks do it because in some cases you cannot sanely avoid it (and one of those cases is the implementation of completions - they aren't actually first-class locking primitives of their own, although they actually *used* to be, originally). It is possible that the paravirt spinlocks could be saved by: - moving the clearing of TICKET_SLOWPATH_FLAG into the fastpath locking code. - making sure that the "unlock" path never does a *write* to the possibly stale lock. KASan would still complain about the read, but we could just say that it's a speculative read - bad form, but not disastrous. but right now they are clearly horribly broken. Because the unlock path doesn't just read the value, it can end up writing to it too. Looking at the Xen version, for example, the unlock_kick part looks fine. It just compares the pointer to the lock against its data structures, and doesn't touch the lock itself. Which is fine. But right now, the real problem is that "__ticket_unlock_slowpath()" will clear the TICKET_SLOWPATH_FLAG on a lock that has already been unlocked - which means that it may actually *change* a byte that has already been freed. It might not be a spinlock any more, it might have been reused for something else, and might be anything else, and clearing TICKET_SLOWPATH_FLAG might be clearing a random bit in a kernel pointer or other value.. It is *very* hard to hit, though. And it obviously doesn't happen in raw hardware, but paravirt people need to think hard about this. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/