awk help needed

Hi all.

I have a bit of a problem here. I’m a bit new to awk, but I think that he is the one for the following task.

I have files in the following format:


 C
 C      1   1.3756312
 C      2   1.3879733  1   120.0957745
 C      3   1.3850912  2   119.6454502  1     7.8601452  0
 C      4   1.3892896  3   119.7264098  2    -6.4720895  0
 C      5   1.3844873  4   120.0924064  3    -1.2289029  0
 N      3   1.4348439  2   119.2949612  1  -166.5566349  0
 C      7   1.3250991  3   123.0481206  2    18.5755546  0
...

An I need to reformat then into this:


 c
 c   1 cc2
 c   2 cc3        1 ccc3
 c   3 cc4        2 ccc4         1 dih4
 c   4 cc5        3 ccc5         2 dih5
 c   5 cc6        4 ccc6         3 dih6
 n   2 nc7        3 ncc7         4 dih7
 c   7 cn8        2 cnc8         3 dih8
 c   8 cc9        7 ccn9         2 dih9
 n   9 nc10       8 ncc10        7 dih10
...

cc2         1.385627
cc3         1.387278
ccc3        120.462
cc4         1.387224
ccc4        119.224
dih4         11.852
cc5         1.384238
ccc5        119.249
dih5          0.237
cc6         1.387992
ccc6        120.570
dih6        -10.931
nc7         1.442256
ncc7        118.545
dih7       -156.703
cn8         1.381867
cnc8        130.485
...

In case anyone is wondering, yes, I’m dealing with zmatrix here. :wink: But it’s about a thousand of them easily. :stuck_out_tongue:

That’s really not a simple task. Is there any way of doing so in a script? :stuck_out_tongue:

Btw, I have some freedom in the “variable names”. o, as in the 7th column of the first section in second file I can have just “dih” strings, I can also have “bon” and “ang” in the 3rd and 5th columns. :wink:

Quite done here:

#!/bin/tcsh

set FILE = $1
set N = `cat -n $FILE | awk '{print $1}' | tail -n 1`
set i = 0

# First Atom:
head -n 1 $FILE
# Second Atom:
sed -n -e 2,2p $FILE | awk '{print "", $1, " ", $2, "bon2"}'
sed -n -e 2,2p $FILE | awk '{print "bon2", "      ", $3}' > tmp
# Third Atom:
sed -n -e 3,3p $FILE | awk '{print "", $1, " ", $2, "bon3", "      ", $4, "ang3"}'
sed -n -e 3,3p $FILE | awk '{print "bon3", "      ", $3}' >> tmp
sed -n -e 3,3p $FILE | awk '{print "ang3", "      ", $5}' >> tmp
# All The Others:
set i = 4
while ($i <= $N)
   sed -n -e $i,$i\p $FILE | awk '{print "", $1, " ", $2, "bon" "'"$i"'", "      ", $4, "ang" "'"$i"'", "       ", $6, "dih" "'"$i"'"}'
   sed -n -e $i,$i\p $FILE | awk '{print "bon" "'"$i"'", "      ", $3}' >> tmp
   sed -n -e $i,$i\p $FILE | awk '{print "ang" "'"$i"'", "      ", $5}' >> tmp
   sed -n -e $i,$i\p $FILE | awk '{print "dih" "'"$i"'", "      ", $7}' >> tmp
   @ i = ( $i + 1 )
end
# Now The Variables And Their Values List:
echo ''
cat tmp
rm tmp

#EOF

Just one thing, does anybody has any clue on how to make the collumns get properly aligned? :slight_smile:

Thanks a lot!

Good that you worked it out.

However you may find that in future, it may be more elegant, more efficient and less error-prone to do it all inside awk instead of mixing awk and shell. After all, awk is a programming language. To help you here are some features of awk:

Associative arrays, e.g.:

value"cc2"] = $2;
var=“cc2”;
print value[var];

BEGIN and END blocks. They are executed before and after respectively, any lines of input are read in. You can use an END block to do the final processing, after you have accumulated the info.

You would put your awk program in a file and run it like this:

awk -f munge.awk < input > output

You could also do it all in a language like Perl, Python or Ruby.

Hi ken.

Thanks a lot. I still need time to really test if it’s ok, because I’m not sure about how the programs will deal with those “unaligned columns”. But most of the work is done.

I never liked perl. I considered python and ruby, but they were left aside due to the learning curves taking time that I can’t afford now. For the same reason, at a certain point I decided to not go for bash just due to RNGs, and stay on tcsh in order to avoid having to rewrite a lot of job that was already done (despiting the “test run” options available in bash tha looks really nice. :smiley: ).

I also thought about programming straight to awk, but I never found a tutorial enoughly good on that. The ones I seen always shown awk in programming to be a bit too “clumsy” for my taste, and really hard to get. Taking the task above as an example, I would have to make awk read in different manners the first 3 lines, then the fourthy until the end of the file, and make a straighty redirection to file of part of the results while keeping track of the other part to put it just after the full input file was read in. I guess an pure awk program for that will easilly get really nasty to read. :stuck_out_tongue:

Again, thanks a lot, and if anyone come up with a simple trick to properly align the columns (which my script doesn’t do) I would be really gratefull! :wink:

I used to do everything with awk, but that’s 10 years ago. Looked for old files but don’t have 'em anymore.
From what I remember one can simply set the FS (fiels separator) to read the records, then conditionally write them in a new order/layout. A ‘man awk’ should bring you the info you need.

Good luck

Hi johannesrs,
I found your question as a challenge and I wrote a script in awk.(was good to remember it after 4-5 years). It’s not the best but it does the job, the same yours does. This script deals with only 1 input file at a time. If you need more then 1, I might be able to help(time permitting).
Here is the script:
#! /usr/bin/gawk -f
{
if (NF >= 7)
{
print " "$1,$2,“bon"NR,” ",$4,“ang"NR,” ",$6,"dih"NR
variable_values[1,NR]=“bon"NR” "$3
variable_values[2,NR]=“ang"NR” "$5
variable_values[3,NR]=“dih"NR” "$7
}
else if (NF == 5)
{
print " "$1,$2,“bon"NR,” ",$4,"ang"NR
variable_values[1,NR]=“bon"NR” "$3
variable_values[2,NR]=“ang"NR” "$5
}
else if (NF == 3)
{
print " "$1,$2,"bon"NR
variable_values[1,NR]=“bon"NR” "$3
}
else
{
print " "$0
}
}

END {
print “”
for (col = 1; col <= NR; ++col)
{
for (row = 1; row <= 3; ++row)
{
if ( variable_values[row,col] != “”)
{
print (variable_values[row,col])
}
}
}
}

you should run it like this:
awk -f ‘./awkProcessor’ ./input.file > ./tempfile
where:
awkProcessor is the script (don’t forget to give it execute rights)
./input.file is your input file
./tempfile is your output file

cheers

@dmera, good work. But also remember you can test NF in the gate expression.

NF>=7 { ... }
NF==5 { ... }
NF==3 { ... }
(a bit tricky for the else case, but not too bad, the values are 1, 2, 4, and 6.)
END { dump out saved info }

or test NR for the record number and act accordingly.

@johannesrs, pardon the frankness, but what a wuss you are. A search would have found you any of a number of awk tutes.

As for aligning fields, try using printf with in the format string.

Thank you, ken_yap for the tip. Well, @johannesrs will have to pay for the tips(with 25 OpenSuse community replies to new users). I had nothing to do at work while waiting for some user_id’s to be created(over a month) where the bureaucracy related to security is crazy(as it is on all the big companies I think).