Results 1 to 7 of 7

Thread: illegal charecters in filename strings, how to remove

  1. #1

    Default illegal charecters in filename strings, how to remove

    These are typical filenames I use in creating folders in a script.

    linux_2020-04-29_09:20:55pm

    If user makes a custom date, I want mask out anything that will cause linux problems.

    This is what I found and was able to adapt. I don't fully understand it. I want to



    DateTimeStamp=$( date +"$DTS_tmp" | sed -e 's/[^A-Za-z0-9:._-]/_/g')

    How do adapt this to mask out illegal characters? Only the minimum linux needs, such as spaces in filenames.

  2. #2
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    27,155

    Default Re: illegal charecters in filename strings, how to remove

    As far as I know only the / is illegal in a Unix/Linux file name. Maybe also the NULL (x'00') character.
    Henk van Velden

  3. #3
    Join Date
    Aug 2010
    Location
    Chicago suburbs
    Posts
    14,429
    Blog Entries
    3

    Default Re: illegal charecters in filename strings, how to remove

    Quote Originally Posted by lord_valarian View Post
    This is what I found and was able to adapt. I don't fully understand it. I want to

    DateTimeStamp=$( date +"$DTS_tmp" | sed -e 's/[^A-Za-z0-9:._-]/_/g')
    That "sed" part changes every character to "_" with the exception of the 4 characters ":._-" (quotes are not part of that string).
    As Henk says, only "/" is illegal (and probably '\000' (or NUL, binary zero)).

    I don't know where you are using this. However, you might want to remove that ":" (colon) from the allowed characters if you are using Windows file systems. That's an illegal character for Windows file names.
    openSUSE Leap 15.2; KDE Plasma 5.18.5;

  4. #4

    Default Re: illegal charecters in filename strings, how to remove

    Quote Originally Posted by lord_valarian View Post
    These are typical filenames I use in creating folders in a script.

    linux_2020-04-29_09:20:55pm

    If user makes a custom date, I want mask out anything that will cause linux problems.

    This is what I found and was able to adapt. I don't fully understand it. I want to



    DateTimeStamp=$( date +"$DTS_tmp" | sed -e 's/[^A-Za-z0-9:._-]/_/g')

    How do adapt this to mask out illegal characters? Only the minimum linux needs, such as spaces in filenames.
    Hi,

    Maybe you really want sed but just to show that using Paremeter Expansion that is specific to bash works too without using sed.
    Code:
    var=':\)(*$%#@!linux_2020-04-29_09:20:55pm[]+\;?'
    Code:
    echo "${var//[![:alnum:]]/_}"
    Output
    Code:
    __________linux_2020_04_29_09_20_55pm______
    That works on both 5.0.16 and 4.4.23 version of bash, in a script works too without extlglob.
    "Unfortunately time is always against us" -- [Morpheus]

    .:https://github.com/Jetchisel:.

  5. #5

    Default Re: illegal charecters in filename strings, how to remove

    Quote Originally Posted by nrickert View Post
    That "sed" part changes every character to "_" with the exception of the 4 characters ":._-" (quotes are not part of that string).
    As Henk says, only "/" is illegal (and probably '\000' (or NUL, binary zero)).

    I don't know where you are using this. However, you might want to remove that ":" (colon) from the allowed characters if you are using Windows file systems. That's an illegal character for Windows file names.
    Code:
            Date_Time_Stamp=$( date +"$DTS_tmp2" | tr ' ' '_' | sed -e 's/[^A-Za-z0-9:._-]//g')
    
            Current_Vault_Folder="$Virus_Vault_Folder/${OStype}_"
            Current_Vault_Folder+="$Date_Time_Stamp"
            mkdir "${Current_Vault_Folder}"
    A user might scan windows and copy the virus files into that folder. Then transfer them to a windows system. So, I should need to restrict to windows illegal characters as well. It will depend if a typical user will do that.

    https://stackoverflow.com/questions/...irectory-names

    Let's keep it simple and answer the question, first.

    1. The forbidden printable ASCII characters are:

      • Linux/Unix:
        / (forward slash)
      • Windows:
        < (less than)
        > (greater than)
        : (colon - sometimes works, but is actually NTFS Alternate Data Streams)
        " (double quote)
        / (forward slash)
        \ (backslash)
        | (vertical bar or pipe)
        ? (question mark)
        * (asterisk)

    2. Non-printable characters
      If your data comes from a source that would permit non-printable characters then there is more to check for.

      • Linux/Unix:
        0 (NULL byte)
      • Windows:
        0-31 (ASCII control characters)

      Note: While it is legal under Linux/Unix file systems to create files with control characters in the filename, it might be a nightmare for the users to deal with such files.
    3. Reserved file names
      The following filenames are reserved:

      • Windows:
        CON, PRN, AUX, NUL
        COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
        LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9
        (both on their own and with arbitrary file extensions, e.g. LPT1.txt).

    4. Other rules

      • Windows:
        Filenames cannot end in a space or dot.

  6. #6

    Default Re: illegal charecters in filename strings, how to remove

    Quote Originally Posted by jetchisel View Post
    Hi,

    Maybe you really want sed but just to show that using Parameter Expansion that is specific to bash works too without using sed.
    Code:
    var=':\)(*$%#@!linux_2020-04-29_09:20:55pm[]+\;?'
    Code:
    echo "${var//[![:alnum:]]/_}"
    Output
    Code:
    __________linux_2020_04_29_09_20_55pm______
    That works on both 5.0.16 and 4.4.23 version of bash, in a script works too without extlglob.
    The version of bash should be the most common. SED is included by default?

    Reading the CFG file:
    ______________________________scanvirus configuration______________________________
    Date[space]Time or Time[space]Date
    date +'%Y-%m-%d %I:%M:%S%P'
    DateTimeStamp= %Y-%m-%d %I:%M:%S%P
    ___________________________________________________________________________________
    ExcludedScanFolders= dev etc kdeinit5__0 proc tmp srv sys .snapshots
    ___________________________________________________________________________________

    The code clips.

    Code:
                   #remove all past ';'
                   #printf "%s\n" "$line"
                   DTS_tmp1=${line#DateTimeStamp= *}
                   #printf "%s\n" "$DTS_tmp1"
                   DTS_tmp2=${DTS_tmp1%%;*}
                   #printf "%s\n" "$DTS_tmp2"
    
                   #check for valid date and time
                   Date_Time_Stamp=$( date +"$DTS_tmp2" | tr ' ' '_' | tr sed -e 's/[^A-Za-z0-9:._-]//g')
    
            Current_Vault_Folder="$Virus_Vault_Folder/linux_"
            Current_Vault_Folder+="$Date_Time_Stamp"
            mkdir "${Current_Vault_Folder}"
    After this, to make later output more readable. I convert '_ ' to '[space]'

    I need the folder name to be readable, but hold no illegal characters for both windows and linux. Give the user the most flexibility for the date time stamp.

    I let through only some characters or mask out illegal characters.

  7. #7

    Default Re: illegal charecters in filename strings, how to remove

    Thanks to everyone for the help.

    I've figured out how to solve the problem.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •