Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63CC6C4332F for ; Fri, 17 Dec 2021 22:03:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231225AbhLQWDC (ORCPT ); Fri, 17 Dec 2021 17:03:02 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:32199 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231211AbhLQWC5 (ORCPT ); Fri, 17 Dec 2021 17:02:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639778576; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KoZY8J6zdihNxPty00DekWdMgSgINIRofX014qBfuXc=; b=b3ByYYAU/iKiN4JGsonPzBR3/8b0ovzTpDcExPJEhiDfgZDR3ur1tSziDSZ13LIo7s5pfM WOcqBLaQKmY71PuXq/e1/iHKKXCneJJxphX74yjB0QgiXbiCH0CSEOT/vTnuOOfUaU0QWl Dc/62chGivOaI60CDdPjxp05kd+BTCU= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-550-edhEXfWWPPGDpk-CWos9KQ-1; Fri, 17 Dec 2021 17:02:55 -0500 X-MC-Unique: edhEXfWWPPGDpk-CWos9KQ-1 Received: by mail-wr1-f70.google.com with SMTP id r7-20020adfbb07000000b001a254645f13so923198wrg.7 for ; Fri, 17 Dec 2021 14:02:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=KoZY8J6zdihNxPty00DekWdMgSgINIRofX014qBfuXc=; b=e6sikyKNgShFyGRZddEZAbhZPk8eMbsnSbaEjKmpnRsyr3hlQwUaDJzt+y7tOayMu4 sKMOm92GClSlXd1AOHVY6q/B+4hdWGgMFvMWfrrqFLaByqOccXYp1QxIeqQbxNOhVDdy jOmBDQj8p337+ZjPJcFqcpqpN7aTrE6BpMoGq53wELxTD3al0KZCgv5hdfZTVbg2EtMP nZgVF0O4MYaIWuxNBqm1aBnWw5Ona9HlqOM5Enyl9PQ/Xt92ESa1n+rNhABd/uEM8SLL bJDglbcLmls2VJ+uDPyv+IZ2ioOUWDXuCFjKivG9CBhdG74XLlJoKBvYPB7zwyyj197w 9HqA== X-Gm-Message-State: AOAM531qv7g7aqU6bgkfaGImRa8iOLBS+wD3joIkjxk2+P/Lfwm5e96Y feA5pW2ldlrU4HmDf/KD8q2kb1BB3Cj1lfidmMioBhdWeINZwlS0W9b+D69Da4LR6gPgLBBDdJK l5oCOPJfr/98qm7PaITWHsEiq X-Received: by 2002:adf:eb0f:: with SMTP id s15mr4024607wrn.690.1639778574331; Fri, 17 Dec 2021 14:02:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJxuRL5qmAQJevVVPuYVbWt7BvxHm1ZgAzZFI4yLinNO0rt5Hd/+vPnwjD/ORD2+QLU8xQHpiA== X-Received: by 2002:adf:eb0f:: with SMTP id s15mr4024592wrn.690.1639778574107; Fri, 17 Dec 2021 14:02:54 -0800 (PST) Received: from [192.168.3.132] (p4ff234b8.dip0.t-ipconnect.de. [79.242.52.184]) by smtp.gmail.com with ESMTPSA id w25sm8541965wmk.20.2021.12.17.14.02.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Dec 2021 14:02:53 -0800 (PST) Message-ID: <644daada-1d77-68bb-1682-fb67f0b427f0@redhat.com> Date: Fri, 17 Dec 2021 23:02:52 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Content-Language: en-US To: Thomas Gleixner , linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , Boqun Feng , Jonathan Corbet References: <20211217113049.23850-1-david@redhat.com> <20211217113049.23850-2-david@redhat.com> <87h7b6c44o.ffs@tglx> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 01/11] seqlock: provide lockdep-free raw_seqcount_t variant In-Reply-To: <87h7b6c44o.ffs@tglx> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 17.12.21 22:28, Thomas Gleixner wrote: > On Fri, Dec 17 2021 at 12:30, David Hildenbrand wrote: >> Sometimes it is required to have a seqcount implementation that uses >> a structure with a fixed and minimal size -- just a bare unsigned int -- >> independent of the kernel configuration. This is especially valuable, when >> the raw_ variants of the seqlock function will be used and the additional >> lockdep part of the seqcount_t structure remains essentially unused. >> >> Let's provide a lockdep-free raw_seqcount_t variant that can be used via >> the raw functions to have a basic seqlock. >> >> The target use case is embedding a raw_seqcount_t in the "struct page", >> where we really want a minimal size and cannot tolerate a sudden grow of >> the seqcount_t structure resulting in a significant "struct page" >> increase or even a layout change. > Hi Thomas, thanks for your feedback! > Cannot tolerate? Could you please provide a reason and not just a > statement? Absolutely. "struct page" is supposed to have a minimal size with a fixed layout. Embedding something inside such a structure can change the fixed layout in a way that it can just completely breaks any assumptions on location of values. Therefore, embedding a complex structure in it is usually avoided -- and if we have to (spin_lock), we work around sudden size increases. There are ways around it: allocate the lock and only store the pointer in the struct page. But that most certainly adds complexity, which is why I want to avoid it for now. I'll extend that answer and add it to the patch description. > >> Provide raw_read_seqcount_retry(), to make it easy to match to >> raw_read_seqcount_begin() in the code. >> >> Let's add a short documentation as well. >> >> Note: There might be other possible users for raw_seqcount_t where the >> lockdep part might be completely unused and just wastes memory -- >> essentially any users that only use the raw_ function variants. > > Even when the reader side uses raw_seqcount_begin/retry() the writer > side still can use the non-raw variant which validates that the > associated lock is held on write. Yes, that's my understanding as well. > > Aside of that your proposed extra raw sequence count needs extra care > vs. PREEMPT_RT and this want's to be very clearly documented. Why? > > The lock association has two purposes: > > 1) Lockdep coverage which unearthed bugs already Yes, that's a real shame to lose. > > 2) PREEMPT_RT livelock prevention > > Assume the following: > > spin_lock(wrlock); > write_seqcount_begin(seq); > > ---> preemption by a high priority reader > > seqcount_begin(seq); <-- live lock > > The RT substitution does: > > seqcount_begin(seq) > cnt = READ_ONCE(seq->sequence); > > if (cnt & 1) { > lock(s->lock); > unlock(s->lock); > } > > which prevents the deadlock because it makes the reader block on > the associated lock, which allows the preempted writer to make > progress. > > This applies to raw_seqcount_begin() as well. > > I have no objections against the construct itself, but this has to be > properly documented vs. the restriction this imposes. Absolutely, any input is highly appreciated. But to mention it again: whatever you can do with raw_seqcount_t, you can do with seqcount_t, and there are already users relying completely on the raw_ function variants (see my other reply). So the documentation should most probably be extended to cover the raw_ functions and seqcount_t in general. > > As you can see above the writer side therefore has to ensure that it > cannot preempted on PREEMPT_RT, which limits the possibilities what you > can do inside a preemption (or interrupt) disabled section on RT enabled > kernels. See Documentation/locking/locktypes.rst for further information. It's going to be used for THP, which are currently incompatible with PREEMPT_RT (disabled in the Kconfig). But preemption is also disabled because we're using bit_spin_lock(), which does a bit_spin_lock(). Certainly worth documenting! Thanks for your input! -- Thanks, David / dhildenb