Results 1 to 8 of 8

Thread: awk help needed

  1. #1
    Join Date
    Sep 2008
    Location
    Porto Alegre, RS, Brazil
    Posts
    124

    Default awk help needed

    Hi all.

    I have a bit of a problem here. I'm a bit new to awk, but I think that he is the one for the following task.

    I have files in the following format:

    Code:
     C
     C      1   1.3756312
     C      2   1.3879733  1   120.0957745
     C      3   1.3850912  2   119.6454502  1     7.8601452  0
     C      4   1.3892896  3   119.7264098  2    -6.4720895  0
     C      5   1.3844873  4   120.0924064  3    -1.2289029  0
     N      3   1.4348439  2   119.2949612  1  -166.5566349  0
     C      7   1.3250991  3   123.0481206  2    18.5755546  0
    ...
    An I need to reformat then into this:

    Code:
     c
     c   1 cc2
     c   2 cc3        1 ccc3
     c   3 cc4        2 ccc4         1 dih4
     c   4 cc5        3 ccc5         2 dih5
     c   5 cc6        4 ccc6         3 dih6
     n   2 nc7        3 ncc7         4 dih7
     c   7 cn8        2 cnc8         3 dih8
     c   8 cc9        7 ccn9         2 dih9
     n   9 nc10       8 ncc10        7 dih10
    ...
    
    cc2         1.385627
    cc3         1.387278
    ccc3        120.462
    cc4         1.387224
    ccc4        119.224
    dih4         11.852
    cc5         1.384238
    ccc5        119.249
    dih5          0.237
    cc6         1.387992
    ccc6        120.570
    dih6        -10.931
    nc7         1.442256
    ncc7        118.545
    dih7       -156.703
    cn8         1.381867
    cnc8        130.485
    ...
    In case anyone is wondering, yes, I'm dealing with zmatrix here. But it's about a thousand of them easily.

    That's really not a simple task. Is there any way of doing so in a script?

    Btw, I have some freedom in the "variable names". o, as in the 7th column of the first section in second file I can have just "dih" strings, I can also have "bon" and "ang" in the 3rd and 5th columns.

  2. #2
    Join Date
    Sep 2008
    Location
    Porto Alegre, RS, Brazil
    Posts
    124

    Lightbulb Re: awk help needed

    Quite done here:

    Code:
    #!/bin/tcsh
    
    set FILE = $1
    set N = `cat -n $FILE | awk '{print $1}' | tail -n 1`
    set i = 0
    
    # First Atom:
    head -n 1 $FILE
    # Second Atom:
    sed -n -e 2,2p $FILE | awk '{print "", $1, " ", $2, "bon2"}'
    sed -n -e 2,2p $FILE | awk '{print "bon2", "      ", $3}' > tmp
    # Third Atom:
    sed -n -e 3,3p $FILE | awk '{print "", $1, " ", $2, "bon3", "      ", $4, "ang3"}'
    sed -n -e 3,3p $FILE | awk '{print "bon3", "      ", $3}' >> tmp
    sed -n -e 3,3p $FILE | awk '{print "ang3", "      ", $5}' >> tmp
    # All The Others:
    set i = 4
    while ($i <= $N)
       sed -n -e $i,$i\p $FILE | awk '{print "", $1, " ", $2, "bon" "'"$i"'", "      ", $4, "ang" "'"$i"'", "       ", $6, "dih" "'"$i"'"}'
       sed -n -e $i,$i\p $FILE | awk '{print "bon" "'"$i"'", "      ", $3}' >> tmp
       sed -n -e $i,$i\p $FILE | awk '{print "ang" "'"$i"'", "      ", $5}' >> tmp
       sed -n -e $i,$i\p $FILE | awk '{print "dih" "'"$i"'", "      ", $7}' >> tmp
       @ i = ( $i + 1 )
    end
    # Now The Variables And Their Values List:
    echo ''
    cat tmp
    rm tmp
    
    #EOF
    Just one thing, does anybody has any clue on how to make the collumns get properly aligned?

    Thanks a lot!

  3. #3
    Join Date
    Jun 2008
    Location
    UTC+10
    Posts
    9,683
    Blog Entries
    4

    Default Re: awk help needed

    Good that you worked it out.

    However you may find that in future, it may be more elegant, more efficient and less error-prone to do it all inside awk instead of mixing awk and shell. After all, awk is a programming language. To help you here are some features of awk:

    Associative arrays, e.g.:

    value["cc2"] = $2;
    var="cc2";
    print value[var];

    BEGIN and END blocks. They are executed before and after respectively, any lines of input are read in. You can use an END block to do the final processing, after you have accumulated the info.

    You would put your awk program in a file and run it like this:

    awk -f munge.awk < input > output

    You could also do it all in a language like Perl, Python or Ruby.

  4. #4
    Join Date
    Sep 2008
    Location
    Porto Alegre, RS, Brazil
    Posts
    124

    Default Re: awk help needed

    Hi ken.

    Thanks a lot. I still need time to really test if it's ok, because I'm not sure about how the programs will deal with those "unaligned columns". But most of the work is done.

    I never liked perl. I considered python and ruby, but they were left aside due to the learning curves taking time that I can't afford now. For the same reason, at a certain point I decided to not go for bash just due to RNGs, and stay on tcsh in order to avoid having to rewrite a lot of job that was already done (despiting the "test run" options available in bash tha looks *really* nice. ).

    I also thought about programming straight to awk, but I never found a tutorial enoughly good on that. The ones I seen always shown awk in programming to be a bit too "clumsy" for my taste, and really hard to get. Taking the task above as an example, I would have to make awk read in different manners the first 3 lines, then the fourthy until the end of the file, and make a straighty redirection to file of part of the results while keeping track of the other part to put it just after the full input file was read in. I guess an pure awk program for that will easilly get *really* nasty to read.

    Again, thanks a lot, and if anyone come up with a simple trick to properly align the columns (which my script doesn't do) I would be really gratefull!

  5. #5
    Join Date
    Jun 2008
    Location
    Groningen, Netherlands
    Posts
    20,925
    Blog Entries
    14

    Default Re: awk help needed

    I used to do everything with awk, but that's 10 years ago. Looked for old files but don't have 'em anymore.
    From what I remember one can simply set the FS (fiels separator) to read the records, then conditionally write them in a new order/layout. A 'man awk' should bring you the info you need.

    Good luck
    ° Appreciate my reply? Click the star and let me know why.

    ° Perfection is not gonna happen. No way.

    http://en.opensuse.org/User:Knurpht
    http://nl.opensuse.org/Gebruiker:Knurpht

  6. #6
    Join Date
    Sep 2008
    Location
    Toronto,Canada
    Posts
    549

    Default Re: awk help needed

    Hi johannesrs,
    I found your question as a challenge and I wrote a script in awk.(was good to remember it after 4-5 years). It's not the best but it does the job, the same yours does. This script deals with only 1 input file at a time. If you need more then 1, I might be able to help(time permitting).
    Here is the script:
    #! /usr/bin/gawk -f
    {
    if (NF >= 7)
    {
    print " "$1,$2,"bon"NR," ",$4,"ang"NR," ",$6,"dih"NR
    variable_values[1,NR]="bon"NR" "$3
    variable_values[2,NR]="ang"NR" "$5
    variable_values[3,NR]="dih"NR" "$7
    }
    else if (NF == 5)
    {
    print " "$1,$2,"bon"NR," ",$4,"ang"NR
    variable_values[1,NR]="bon"NR" "$3
    variable_values[2,NR]="ang"NR" "$5
    }
    else if (NF == 3)
    {
    print " "$1,$2,"bon"NR
    variable_values[1,NR]="bon"NR" "$3
    }
    else
    {
    print " "$0
    }
    }

    END {
    print ""
    for (col = 1; col <= NR; ++col)
    {
    for (row = 1; row <= 3; ++row)
    {
    if ( variable_values[row,col] != "")
    {
    print (variable_values[row,col])
    }
    }
    }
    }

    you should run it like this:
    awk -f './awkProcessor' ./input.file > ./tempfile
    where:
    awkProcessor is the script (don't forget to give it execute rights)
    ./input.file is your input file
    ./tempfile is your output file

    cheers

  7. #7
    Join Date
    Jun 2008
    Location
    UTC+10
    Posts
    9,683
    Blog Entries
    4

    Default Re: awk help needed

    @dmera, good work. But also remember you can test NF in the gate expression.

    Code:
    NF>=7 { ... }
    NF==5 { ... }
    NF==3 { ... }
    (a bit tricky for the else case, but not too bad, the values are 1, 2, 4, and 6.)
    END { dump out saved info }
    or test NR for the record number and act accordingly.

    @johannesrs, pardon the frankness, but what a wuss you are. A search would have found you any of a number of awk tutes.

    As for aligning fields, try using printf with \t in the format string.

  8. #8
    Join Date
    Sep 2008
    Location
    Toronto,Canada
    Posts
    549

    Default Re: awk help needed

    Thank you, ken_yap for the tip. Well, @johannesrs will have to pay for the tips(with 25 OpenSuse community replies to new users). I had nothing to do at work while waiting for some user_id's to be created(over a month) where the bureaucracy related to security is crazy(as it is on all the big companies I think).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •