Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1411109imm; Fri, 27 Jul 2018 17:25:17 -0700 (PDT) X-Google-Smtp-Source: AAOMgpe+t11vUJgr7TSr1rDqNKTt/fFN/dYFCES/figqDby3v1d4TSVyZElCt5QMjqPogMSpuDKF X-Received: by 2002:a17:902:7587:: with SMTP id j7-v6mr7772542pll.256.1532737517232; Fri, 27 Jul 2018 17:25:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532737517; cv=none; d=google.com; s=arc-20160816; b=RitHUtLjzrO2Z9bPfKADd5KznQ3WMTAaC07u03s+PUrECULZOSxXnQ6zin0R95b3NX 3IxFM6hpltYRJlzuw9mqGfH2APYuWst5iRGhjDXYuWojhBOmRYJhtfCJ9nZ6t5erpMO5 wtlIFOsuYeTGsuOPcjolPPf1EYGn5SBvSA4IqLIu/3dKxaqYVelLA4EqhPuccDE3co8A YjQe54lwlSv/wF/C1+VpHZhVDKgP8xifMpI76hXMkg1jbHpnyenhfjR046YNwKmfvC1A Ndz2SUwRQqvegyB8MDb2NG+uzNvTKMRthCKoth5klBnAVN8yQqx+5GkDOyKXn7+hvK7y ATeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-id:content-language:accept-language:in-reply-to:references :message-id:date:thread-index:thread-topic:subject:cc:to:from :arc-authentication-results; bh=ddWPiP6R7/7/8bSq9RkwLotMPE5HH66VhDDJRqV2WT8=; b=Yz5cIRFNd7BJ687nPUhpyhOhzMGS32nQrRHe+jKfV+9z71qPFSJ1mvsyMF/sDZTPRq APZ/c8lRlj/yT+p/h/YcM2K8YR5GN4TcA7BToVDoN6l1btnE5MxLztCNkjDMERDGUjb8 iIKME2JgqgiN9zUtzO8Iv8i700U49C6usLCusyX67cpDg4Mz8auXStKALLmUJnbxNDf3 D0UkS6MThI2aDFn9V1Sbkwj5zMXVlT4WKEDsVLtab8vw9MRwWl8NEM/vEMEhE281OIwq MUTR1DE9Jlfw1U0gwYbw78+lfduWp0MPn6lkzMnOqotxuZJPvNV+ZTWm54tsHSpbkAvM HYaw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 24-v6si4981475pgx.314.2018.07.27.17.25.01; Fri, 27 Jul 2018 17:25:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389147AbeG1Bsc convert rfc822-to-8bit (ORCPT + 99 others); Fri, 27 Jul 2018 21:48:32 -0400 Received: from mx2.mpynet.fi ([82.197.21.85]:42295 "EHLO mx2.mpynet.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388735AbeG1Bsc (ORCPT ); Fri, 27 Jul 2018 21:48:32 -0400 X-Greylist: delayed 597 seconds by postgrey-1.27 at vger.kernel.org; Fri, 27 Jul 2018 21:48:30 EDT From: Anton Altaparmakov To: David Howells CC: Jann Horn , Al Viro , Linux API , Linus Torvalds , "linux-fsdevel@vger.kernel.org" , kernel list Subject: Re: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10] Thread-Topic: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10] Thread-Index: AQHUJdAyV0CcP4PZfEyj5IwXni6Gg6SjgUUAgAAJmACAAAcLgA== Date: Sat, 28 Jul 2018 00:14:14 +0000 Message-ID: <7C807D58-6B8C-400F-AF67-CD2F38BC0AE4@tuxera.com> References: <153271267980.9458.7640156373438016898.stgit@warthog.procyon.org.uk> <153271291017.9458.7827028432894772673.stgit@warthog.procyon.org.uk> <21395.1532735340@warthog.procyon.org.uk> In-Reply-To: <21395.1532735340@warthog.procyon.org.uk> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted Content-Type: text/plain; charset="us-ascii" Content-ID: <752FD3C5BE365846B549589674C7578E@tuxera.com> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Received-SPF: none Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi David, > On 28 Jul 2018, at 00:49, David Howells wrote: > Jann Horn wrote: >>> +static int fsinfo_generic_name_encoding(struct dentry *dentry, char *buf) >>> +{ >>> + static const char encoding[] = "utf8"; >>> + >>> + if (buf) >>> + memcpy(buf, encoding, sizeof(encoding) - 1); >>> + return sizeof(encoding) - 1; >>> +} >> >> Is this meant to be "encoding to be used by userspace" or "encoding of >> on-disk filenames"? > > The latter. > >> Are there any plans to create filesystems that behave differently? > > isofs, fat, ntfs, cifs for example. > >> If the latter: This is wrong for e.g. a vfat mount that uses a codepage, >> right? Should the default in that case not be "I don't know"? > > Quite possibly. Note that it could also be what you're interpreting it as > because the codepage got overridden by a mount parameter rather than what's on > the disk (assuming the medium actually records this). No, nothing like that is recorded on disk. That would have been way too helpful! (-; The only place Windows records such information is, you may have guessed this: in the registry which of course is local to the computer and unrelated to what removable media is attached... > One thing I'm confused about is that fat has both a codepage and a charset and > I'm not sure of the difference. Oh that is quite simple. (-: The codepage is what is used to translate from/to the on-disk DOS 8.3 style names into the kernel's Unicode character representation. The correct codepage for a particular volume is not stored on disk so it can lead to all sorts of fun if you for example create some names on for example a Japanese Windows on a FAT formatted USB stick and then plug that into a US or European Windows where the default code pages are completely different - all your filenames will appear totally corrupt. (Note this ONLY affects 8.3 style/DOS/short names or whatever you want to call them.) The charset on the other hand is what is used to convert strings coming in from/going out to userspace into the kernel's Unicode character representation. The one nice thing about VFAT (and there aren't many nice things about it!) is that for long names (i.e. not the 8.3 style/DOS/short names), it actually stores on-disk little-endian UTF-16 (since Windows 2000, before that it used little endian UCS-2 - the change was needed to support things like Emojis and some languages that go outside the UCS-2 range of fixed 16-bit unicode). Hope this clears that up. Best regards, Anton > David -- Anton Altaparmakov (replace at with @) Lead in File System Development, Tuxera Inc., http://www.tuxera.com/ Linux NTFS maintainer