Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp39628pxb; Tue, 2 Mar 2021 18:21:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJzyeGqLbIi34LliWC3PQXcmtcdF0J6cFSxnUvztAwnh+YPVIXF9u+emfxQNZVq5JsJ/4kxr X-Received: by 2002:a17:906:4e57:: with SMTP id g23mr23238497ejw.47.1614738088450; Tue, 02 Mar 2021 18:21:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614738088; cv=none; d=google.com; s=arc-20160816; b=OklmnRJpd2d7CNVL3huVAIGAYhnMo8TO9ETxoo5oMAaLEpZqhqCDv6O7AK8KAq8f1E JYQ6bWT+ZpoAFfAiyzzrlN40j9qJ5SyOxWCKc8gL/mtsZgv3gHziHfUdrLuvWBnNGNYd AM0QjpfUD37Im5SFpIqvIg/0M31KKespp12fUVo0t12spFHQb5RWeb0PVuacyhBOsN3J P1hfm3u25wJjCoGT8IiwrrZ95WbRyATmoa/CMpqzL26MdRjmuQH/IVSrV3EYFd8D4lTl Wy3NoyzyvXU2ruw0M+dmMUgKv24bXclFFUMfLjTGeLS5fjvKekRd99rMukw8mDabBIh2 aqcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=5J9Aaz6tdGKZ4rbCMsw46yp8g7/dLSF9lej7J4+mUio=; b=C4ciL2l259jUAybTTDNa1enu6CWTwlmKbQgbiiJ56YXe/Ha5nRITncsmMlxQyHq67r eYkOhBLaY+oWiKq8+JqH4JsttC0wzmth1gXMCK8GseLLGNk/wtpV1cakilRVS2Hyesy2 flXfBl95dgpcC1DRJpefww7ewJfif4EApL5ttoNli5B9oxhaYbVI3PVhsmKwqOfpgXhT MD77C8D6Q8uVzHHOkCHl2d9x6Z/6kqF10hwshXpjFCjOKNjJmIQnf24LQ3GPrX3r7X7t S2Ixy/8XAXoWgghFs3TnSB0CMkujOF5wWmCjGuPkh0Q3zswoedfhjSqf4QbGv0VBwzKK 9DaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hNMSJli1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j1si14957215eja.96.2021.03.02.18.21.05; Tue, 02 Mar 2021 18:21:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hNMSJli1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234571AbhCAKke (ORCPT + 99 others); Mon, 1 Mar 2021 05:40:34 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:21626 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234517AbhCAKho (ORCPT ); Mon, 1 Mar 2021 05:37:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614594971; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=5J9Aaz6tdGKZ4rbCMsw46yp8g7/dLSF9lej7J4+mUio=; b=hNMSJli1BZMPg3WFMDQhNmVqvwJ5Ul3W19WXMNXugm/ZB/OuCxzHOSpt5HIOWVT+NI2xb1 0UaOXIfYXhQPCtp17k8XeaRTBEyBZmWUF7ojmcHSrD+iYPaNVcaFWHANDi0x6+9MVroQ8L ZSeSK6zZWTyvDPLkcAoFUDGMCJXi2tM= Received: from mail-yb1-f197.google.com (mail-yb1-f197.google.com [209.85.219.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-111-QJ6oBs7GO72ddMkT7SK-ug-1; Mon, 01 Mar 2021 05:36:09 -0500 X-MC-Unique: QJ6oBs7GO72ddMkT7SK-ug-1 Received: by mail-yb1-f197.google.com with SMTP id j4so18479678ybt.23 for ; Mon, 01 Mar 2021 02:36:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5J9Aaz6tdGKZ4rbCMsw46yp8g7/dLSF9lej7J4+mUio=; b=nx0Sn//2r993rc2/ZoSitmGRyxJ7ZIZmNUVL+kuxPminAWtD+ZE9tn0EYlBAaM8Mw9 inntMP1n00ULxMQWePPDPuPXCwOFzX2zR3M/tZiBDOluz4RPiaxkE9hDJ/28XyPjUKGF q9yhZVpNaW7JDxkZRaq5B+Shmvw6dCFjymKDA+Rm6f7StMgc1AJ8gIEG/s3a+lL4Ao7k wkBoKXHJTpvO0HqpBbMWzPJOCopAk6JZn7+FRKmLgEt1woAfjuwc+yT0yVmOJO9MmhSG oudhW4yHQy26fKUnv6u1Sqz0FCsZpMKzTuhs1/BsxtgA3JYM+LkP7f6GGXs/jkvD8BsG hkSg== X-Gm-Message-State: AOAM530MHVIAhAk1OVcgkz7wao9UReh+En/MQwRxCzIG9NjUyKG2sObb PGM4K1nX7mt+rTcEt7HXaI7xsi3tpxmnNeCCQMZCRXqR+lF8kVPeMWlEemfmDuNiu8GTnjiw5Ws n9wphov3bm9yIm+5A8p5201xPaWzgAhczPsFmEOY/ X-Received: by 2002:a25:ac4e:: with SMTP id r14mr22127569ybd.340.1614594969008; Mon, 01 Mar 2021 02:36:09 -0800 (PST) X-Received: by 2002:a25:ac4e:: with SMTP id r14mr22127541ybd.340.1614594968699; Mon, 01 Mar 2021 02:36:08 -0800 (PST) MIME-Version: 1.0 References: <20210223214346.GB6000@sequoia> <20210223215054.GC6000@sequoia> <20210223223652.GD6000@sequoia> In-Reply-To: From: Ondrej Mosnacek Date: Mon, 1 Mar 2021 11:35:56 +0100 Message-ID: Subject: Re: [BUG] Race between policy reload sidtab conversion and live conversion To: Paul Moore Cc: Tyler Hicks , Stephen Smalley , SElinux list , Linux kernel mailing list Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Feb 28, 2021 at 8:21 PM Paul Moore wrote: > On Fri, Feb 26, 2021 at 6:12 AM Ondrej Mosnacek wrote: > > On Fri, Feb 26, 2021 at 2:07 AM Paul Moore wrote: > > > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek wrote: > > > > After the switch to RCU, we now have: > > > > 1. Start live conversion of new entries. > > > > 2. Convert existing entries. > > > > 3. RCU-assign the new policy pointer to selinux_state. > > > > [!!! Now actually both old and new sidtab may be referenced by > > > > readers, since there is no synchronization barrier previously provided > > > > by the write lock.] > > > > 4. Wait for synchronize_rcu() to return. > > > > 5. Now only the new sidtab is visible to readers, so the old one can > > > > be destroyed. > > > > > > > > So the race can happen between 3. and 5., if one thread already sees > > > > the new sidtab and adds a new entry there, and a second thread still > > > > has the reference to the old sidtab and also tires to add a new entry; > > > > live-converting to the new sidtab, which it doesn't expect to change > > > > by itself. Unfortunately I failed to realize this when reviewing the > > > > patch :/ > > > > > > It is possible I'm not fully understanding the problem and/or missing > > > an important detail - it is rather tricky code, and RCU can be very > > > hard to reason at times - but I think we may be able to solve this > > > with some lock fixes inside sidtab_context_to_sid(). Let me try to > > > explain to see if we are on the same page here ... > > > > > > The problem is when we have two (or more) threads trying to > > > add/convert the same context into a sid; the task with new_sidtab is > > > looking to add a new sidtab entry, while the task with old_sidtab is > > > looking to convert an entry in old_sidtab into a new entry in > > > new_sidtab. Boom. > > > > > > Looking at the code in sidtab_context_to_sid(), when we have two > > > sidtabs that are currently active (old_sidtab->convert pointer is > > > valid) and a task with old_sidtab attempts to add a new entry to both > > > sidtabs it first adds it to the old sidtab then it also adds it to the > > > new sidtab. I believe the problem is that in this case while the task > > > grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which > > > allows it to race with tasks that already see only new_sidtab. I > > > think adding code to sidtab_context_to_sid() which grabs the > > > new_sidtab->lock when adding entries to the new_sidtab *should* solve > > > the problem. > > > > > > Did I miss something important? ;) > > > > Sadly, yes :) Consider this scenario (assuming we fix the locking at > > sidtab level): > > > > If it happens that a new SID (x) is added via the new sidtab and then > > another one (y) via the old sidtab, to avoid clash of SIDs, we would > > need to leave a "hole" in the old sidtab for SID x. And this will > > cause trouble if the thread that has just added SID y, then tries to > > translate the context string corresponding to SID x (without re-taking > > the RCU read lock and refreshing the policy pointer). Even if we > > handle skipping the "holes" in the old sidtab safely, the translation > > would then end up adding a duplicate SID entry for the context already > > represented by SID x - which is not a state we want to end up in. > > Ah, yes, you're right. I was only thinking about the problem of > adding an entry to the old sidtab, and not the (much more likely case) > of an entry being added to the new sidtab. Bummer. > > Thinking aloud for a moment - what if we simply refused to add new > sidtab entries if the task's sidtab pointer is "old"? Common sense > would tell us that this scenario should be very rare at present, and I > believe the testing mentioned in this thread adds some weight to that > claim. After all, this only affects tasks which entered into their > RCU protected session prior to the policy load RCU sync *AND* are > attempting to add a new entry to the sidtab. That *has* to be a > really low percentage, especially on a system that has been up and > running for some time. My gut feeling is this should be safe as well; > all of the calling code should have the necessary error handling in > place as there are plenty of reasons why we could normally fail to add > an entry to the sidtab; memory allocation failures being the most > obvious failure point I would suspect. This obvious downside to such > an approach is that those operations which do meet this criteria would > fail - and we should likely emit an error in this case - but is this > failure really worse than any other transient kernel failure, No, I don't like this approach at all. Before the sidtab refactor, it had been done exactly this way - ENOMEM was returned while the sidtab was "frozen" (i.e. while the existing entries were being converted). And this was a real nuisance because things would fail randomly during policy reload. And it's not just unimportant explicit userspace actions that can fail. Any kind of transition can lead to a new SID being created and you'd get things like execve(), mkdir(), ... return -ENOMEM sometimes. (With a low probability, but still...) I wouldn't compare it to a memory allocation failure, which normally starts happening only when the system becomes overloaded. Here the user would *awlays* have some probability of getting this error, and they couldn't do anything about it. > and is > attempting to mitigate this failure worth abandoning the RCU approach > for the sidtab? Perhaps it wasn't clear from what I wrote, but I certainly don't want to abandon it completely. Just to revert to a safe state until we figure out how to do the RCU policy reload safely. The solution with two-way conversion seems doable, it's just not a quick and easy fix. -- Ondrej Mosnacek Software Engineer, Linux Security - SELinux kernel Red Hat, Inc.