illegal charecters in filename strings, how to remove

These are typical filenames I use in creating folders in a script.

linux_2020-04-29_09:20:55pm

If user makes a custom date, I want mask out anything that will cause linux problems.

This is what I found and was able to adapt. I don’t fully understand it. I want to

DateTimeStamp=$( date +"$DTS_tmp" | sed -e ‘s/^A-Za-z0-9:.-]//g’)

How do adapt this to mask out illegal characters? Only the minimum linux needs, such as spaces in filenames.

As far as I know only the / is illegal in a Unix/Linux file name. Maybe also the NULL (x’00’) character.

That “sed” part changes every character to “" with the exception of the 4 characters ":.-” (quotes are not part of that string).
As Henk says, only “/” is illegal (and probably ‘\000’ (or NUL, binary zero)).

I don’t know where you are using this. However, you might want to remove that “:” (colon) from the allowed characters if you are using Windows file systems. That’s an illegal character for Windows file names.

Hi,

Maybe you really want sed but just to show that using Paremeter Expansion that is specific to bash works too without using sed.


var=':\)(*$%#@!linux_2020-04-29_09:20:55pm]+\;?'

echo "${var//!:alnum:]]/_}"

Output

__________linux_2020_04_29_09_20_55pm______

That works on both 5.0.16 and **4.4.23 **version of bash, in a script works too without extlglob.


        Date_Time_Stamp=$( date +"$DTS_tmp2" | tr ' ' '_' | sed -e 's/^A-Za-z0-9:._-]//g')

        Current_Vault_Folder="$Virus_Vault_Folder/${OStype}_"
        Current_Vault_Folder+="$Date_Time_Stamp"
        mkdir "${Current_Vault_Folder}"


A user might scan windows and copy the virus files into that folder. Then transfer them to a windows system. So, I should need to restrict to windows illegal characters as well. It will depend if a typical user will do that.

                   ](https://stackoverflow.com/posts/31976060/timeline)           Let's keep it simple and answer the question, first.
  1. The forbidden printable ASCII characters
    are:
  • Linux/Unix:

/ (forward slash)

  • Windows:
    < (less than)
    > (greater than)
    : (colon - sometimes works, but is actually NTFS Alternate Data Streams)
    " (double quote)
    / (forward slash)
    \ (backslash)
    | (vertical bar or pipe)
    ? (question mark)
  • (asterisk)

    1. Non-printable characters
      If your data comes from a source that would permit non-printable characters then there is more to check for.
    • Linux/Unix:

    0 (NULL byte)

    • Windows:
      0-31 (ASCII control characters)

    Note: While it is legal under Linux/Unix file systems to create files with control characters in the filename, it might be a nightmare for the users to deal with such files.

    1. Reserved file names
      The following filenames are reserved:
    • Windows:

    CON, PRN, AUX, NUL
    COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
    LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9
    (both on their own and with arbitrary file extensions, e.g. LPT1.txt).

    1. Other rules
    • Windows:

    Filenames cannot end in a space or dot.

The version of bash should be the most common. SED is included by default?

Reading the CFG file:
scanvirus configuration
Date[space]Time or Time[space]Date
date +‘%Y-%m-%d %I:%M:%S%P’
DateTimeStamp= %Y-%m-%d %I:%M:%S%P


ExcludedScanFolders= dev etc kdeinit5__0 proc tmp srv sys .snapshots


The code clips.

               #remove all past ';'
               #printf "%s
" "$line"
               DTS_tmp1=${line#DateTimeStamp= *}
               #printf "%s
" "$DTS_tmp1"
               DTS_tmp2=${DTS_tmp1%%;*}
               #printf "%s
" "$DTS_tmp2"

               #check for valid date and time
               Date_Time_Stamp=$( date +"$DTS_tmp2" | tr ' ' '_' | tr sed -e 's/^A-Za-z0-9:._-]//g')

        Current_Vault_Folder="$Virus_Vault_Folder/linux_"
        Current_Vault_Folder+="$Date_Time_Stamp"
        mkdir "${Current_Vault_Folder}"

After this, to make later output more readable. I convert '_ ’ to ‘[space]’

I need the folder name to be readable, but hold no illegal characters for both windows and linux. Give the user the most flexibility for the date time stamp.

I let through only some characters or mask out illegal characters.

Thanks to everyone for the help.

I’ve figured out how to solve the problem. :slight_smile: