I have about 300 files that need renaming, because the file system does not display the French characters properly. The dodgy letter in question has been replaced by a “question mark in a black diamond” symbol.
No way of renaming, other then using mv in the Konsole has worked. Is there any way, script or program out there, that will do a batch rename?
Is this a Linux filesystem (like ext2/3) or a non Linux one? I ask because the fact that the filenames are displayed in the wrong character encoding points to something like that. Maybe a mount option could help here.
About the mass/batch/script way to rename (mv) them.
You found allready out that a mv on the console can do this, but even then there are possibilities:
you start typing the filename and then try to use filename completion by using the Esc key;
you type the filename, but use wild cards (? or *) at the place of the non-typable character.
Now 1) can not be scripted. The second one can be scripted if there is some regularity in the filenames, but when they are just random (french) words it will be difficult. Prigramming is good repeating the same thing, but not so good in doing everything different.
And if you’re just wanting to get rid of that one command there is the
rename command which nobody ever uses because ‘mv’ does most renaming in
the world. See the manpage for some details but here’s an example:
To get rid of the first ‘e’ character in every file in the current
directory and replace it with nothing:
rename ‘ê’ ‘’ *
or replace it with something else like a regular ascii ‘e’:
and then I ran the second command above and had the following:
<quote>
ab@mybox0:~/Desktop/test0> ls
test test00 test01 test02 test03 test04 test05 tst06
</quote>
The first parameter is what you replace, the second is what you replace it
with, and the third is which files you match (all in my case).
Good luck.
On 12/10/2010 09:06 AM, hcvv wrote:
>
> You are welcome. It is one of the bsic features of most shells since
> about the 1970’s.
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
Thanky you. That is very handy to know, since I didn’t even realise, there was a rename command. I made a note of it. One learns something new every day.
It doesn’t seem to work with my files, though. The french charachters are dispalyed as � in Dolphin and ? in the Konsole.
Why is that anyway? I get that with German letters too. When I name the files myself my system recognizes them, but if I’m being sent files, I shows the silly sign.
As I hinted above, it has something to do with the character encoding. When you are sent files (how?) the names might be encoded in Latin-1 (or some MS propriety encoding) and when you store them with that sequence of bytes (not giving it a name yourself) in a file system that supposes it is UTF-8 encoded, you have a problem. On interpretation bytes are found that can not be part of UTF-8 encoding and thus are shown as the ? or �.
> When I name the
> files myself my system recognizes them, but if I’m being sent files, I
> shows the silly sign.
That’s crucial information!
Those files were probably created in windows, didn’t they? Then surely they
are using a windows charset, and as far as I know, windows filesystems use
a charset that depends on the user’s settings. At least, this is true for
FAT, I’m not sure about NTFS.
(which is why hcw asked what filesystem you were using)
On the other hand, linux, or at least our linux distro, uses UTF-8 charset
for filesystems (wich can use several bytes per char if needed).
Then, there is the medium you use to interchange files. If it is email, the
name has to be encoded somehow, probably in the same charset used for the
rest of the email. You can find out by looking at the raw email text with mc.
It is the email application who is responsible to create the file when you
get them. You could try with another client.
That’s the why. I don’t know how to solve it - except by renaming each file
as you get them.
–
Cheers / Saludos,
Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)
Looks as a useful tool and exactly for this case. Though the comment in the video is not very exact, using the words language, script and encoding as if it is all the same.
Also UTF-8 is not a Linux standard, but Linux uses open standards when possible and UTF-8 encoded Unicode is an open standard.
Also the list at the end shows that MS has it’s own range of encodings that differ from the ISO ones (often only slightly). Thus finding out what is the encoding (and what is the name used for it in the tool) used on filenames one “gets sended” may be the main task, especialy as the sender will not tell you
my point here is that it seems safe to execute it recursively (for each character set), so one can execute it from a higher directory. it won t touch files which are already utf8.
gives you the switches. and if you re new to this: the asterisk means all (i.e those with special characters) directories and files are changed.
however, very few files did not respond, but this was due to mathematical character inserted instead of Latin w/ Western European special characters (i.e these files were not meant to work very well in the first place ).