I’m not sure if this is the right place, but as of lately, when I’m trying to hit enter on an image file with a special character (in the sense of ≥ U+0080, for example German Umlauts (äöü), special Punctuation (’, “, ”, …)) in their name or path, nothing happens. Pressing “f3” doesn’t invoke the display command to print out the image file properties (resoltution, image type, EXIF data), but only the hex view of the file is shown. Special characters below 0x80, i.e. ', ", *, etc. on the other hand are no problem.
my (and the system’s) mc.ext is the stock one, without alterations.
if I change the file name to add an .mp4, then when I hit enter, mplayer gets started trying to play the file, which means the video file detection accepts it, but the image file detection does not. Renaming the file to something without the suspicious character: then it works again.
The problematic character can anywhere in the path, so /home/alex/töst.jpg will not work, but also /home/alex/töst/file.jpg won’t work either.
My locale is en_US.UTF-8. changing the locale to “C”: then mc shows the file name as /home/alex/t??st.jpg, but still no luck in opening the file.
Must be a recent thing, because I have images in countless directories where the path has some of these special characters in it, and they worked at least until a few weeks ago.
Is not a general mc issue, as on my work computer (using Ubuntu) and the same mc version, it works.
While I’m glad I’m not the only one with this issue, on my system only image files are affected. I have various videos and their cover image, and while I can play the video by hitting enter, I cannot view the cover image by selecting that and pressing enter.
The way you wrote it seems like it is happening for all files…
Author: Christos Zoulas <email@example.com>
Date: Sat May 28 01:04:57 2022 +0000
PR/351: CathyKMeow: octalify unprintable characters in filenames unless raw.
The commit is first available in file 5.42. “raw” option is available for 20 years so it should be safe to use today by default; but it also applies to result (not only to file names) so may require additional changes in MC.
file just checks single byte for being printable which of course fails for UTF-8 multi-byte characters. You should really open a bug report against file, the change does not look right.
P.S. besides it truncates file name which is a bug by itself (even if we accept non-ASCII mangling):
as @arvidjaar pointed out, this is a recent change in ‘file’ from May 28th.
And yes, I noticed the file shortening, too. Apparently, with UTF-8 file names, it takes the number of characters and truncates the string after so many bytes, not characters. Since UTF-8 will encode characters ≥ U+0080 with more than one byte, this truncates the file name eventually.
And yes, the truncating is definitely an error (regression) in ‘file’. Dealing with the new output format of ‘file’ is something MC should handle. (they are also talking about using the -b option to drop the file name from the output altogether).
I since have learned that this all happened with files whose type was detected using the ‘file’ program, but not on files whose type was detected by their extension.
And I have only tried images (detected by ‘file’) and videos (detected by their extension)…
mc have now a bug open, and I have also filed a bug on ‘file’ because even if one uses raw mode for the file names, there is still an error which shortens the file name, which would also make correct detection impossible. Hopefully this gets sorted out soon.
File types are not detected “by extension”. File contents alwys is of some special type (where it simply ASCII characters or random bytes) and people decided to let their file names of files with the same content type end with a the same suffix for easy of memorizing. Very often such a suffix consists of the . (dot) character with a few other characters behind. This looks very much like (and is probably inspired by) the so called extension of the MS-DOS file systems, but it is not the same. E.fg. the . (dot is not part of the extension (or of the file name), it just there to see where the one stops and the other begins. In MS-DOS it also is much more integrated in the operating system.
Unix/Linux itself has no concept of metadata of files describing contents.
Some application programs are like human users, they think that a certain suffix points to a certain type of contents. They may be right, but may be wrong also.
The ‘file’ tool tries to find out what the type of the contents is by using heuristics on the contents itself, partly based on so called “magic numbers” within the file. It has become pretty good in this task and I would always trust it more the methods based on suffices.
bor@bor-Latitude-E5450:~$ head -n 5 /usr/share/mime/globs
# This file was automatically generated by the
# update-mime-database command. DO NOT EDIT!
I said above, for human beings and application programs. In this case for the application suite called “desktop” as you point to freedesktop.org.
MIME types are a bit different, they are defined for typing files on the Internet. Independent from extensions and suffices.
The table you show (part of) shows how the freedesktop community (or what is the name) sees the connection between MIME types on the internet and suffices that can be used as a substitute on internal files to make desktop programs “understand” what (hopefully) might be in the file that came from the internet.
And in a Apache configuration, you will find it the other way around. There the web manager can define (and there are already suitable defaults) which suffix has to be send off to a client with which MIME type.
All independent from the operating system. and It is NOT something Unix/Linux bothers about.
Still, the current implementation of midnight commander detects images using the ‘file’ command (by analyzing the file) and if you give a file name the extension .mp4, it is happy and invokes mplayer (or whatever video player you have configured) and doesn’t bother analyzing further, it simply assumes it’s a video file.
Do you have the slightest idea what freedesktop is and provides? They also have weird ideas about how to create desktop files, how to autostart programs, how to build menus and a lot more. Strange people …
It is NOT something Unix/Linux bothers about.
It is something that is used by at least two major desktops on Unix/Linux (KDE and GNOME) to detect file type not counting all other applications using shared MIME specification.
I do not deny that, but it is still only a bunch off applications, not the Unix/Linux system. And for people that do not understand the difference between applications (including desktops that may or may not have on a system) and the system, understanding of Unix/Linux will stay difficult.