help with recursive searching+editing

Hello there,

I am a newby of bash scripts. I apologize in advance for my mistakes/etc :slight_smile:

# part I

I 'd like to find out a way to search among all *.dat files in a certain path (including subdirectories) looking for the following text in them:

--------------------------------------------------------------------------------
 Elements with small area
  
  Element  Adjusted nodes 
 --------- -------------- 
     16294       NO     
     17889       NO

and getting the list of all “elements with small area” per file in “ErrorEl.txt”. The output should have this form:

"/path/01/A.dat
bad-el#01
bad-el#02

/path/04/A.dat
bad-el#01
bad-el#02
bad-el#03
"

How to do it?

I know already how to find out the dat files containg a certain string

c=/path/

grep -R --include="*.dat" "Elements with small area" $c | cut -d: -f1>> ErrorEl.txt

but I don’t know then how to get the element numbers(16294 and 17889 in the example above)


part II***

once I get the list of “bad elements” per “.dat” file like this:

"/path/01/A.dat
13333
48631

/path/01/B.dat
20001

/path/04/A.dat
13333
"

I would like to edit the corresponding “.inp” files which look like this:

/path/01/A.inp
(Elements, node 1, node 2, node 3)

1, 1, 2, 3
2, 2, 3, 4
…
13333, 13330, 13331, 13332
48629, 48450,48449,48444
48630, 48458,48446,48448
48631, 48448,48455,48458
48643, 48467,48468,48469
48644, 48470,48469,48468
48653, 48446,48469,48476
48654, 48469,48470,48477
48655, 48478,48479,48480
48656, 48478,48480,48481

and eliminate each line whose first element correspond to the “bad” element found.

For instance, the line in orange should be deleted

Many thanks, Alex

With a file like so

 Elements with small area
  
  Element  Adjusted nodes 
 --------- -------------- 
     16294       NO     
     17889       NO

 Elements with large area
  
  Element  Adjusted nodes 
 --------- -------------- 
     16294       NO     
     17889       NO

 Elements with small area
  
  Element  Adjusted nodes 
 --------- -------------- 
     16294       NO     
     17889       NO

Using awk and regexp

awk 'BEGIN{RS="

"}/'"Elements with small"'/' data.txt | awk '$1 ~ /(:digit:]])/{print $1}'
16294
17889
16294
17889

You’ve gone beyond grep this really is one for one of the other ones i.e awk or sed.

So now I’ve bought back the elements to use then you’ll need to go back to awk/sed and use a regexp using ^

I doubt anyone is going to completely help(I.e give you the whole solution) but this is really one for regexp now I stumble myself through regexp but you’ll find google has plenty of tuts. Similar for awk and sed. As for returning the elements in the format required. You probably can tell awk to, as to the how that took me into advanced territory. I suspect as, awk is a language as such you’ll be able to create var n and increase n as any other language. i.e.

awk '{if($0=="200"){$1=id;id++};print}'

I have to admit this probably the trickiest bit. But look at NR of awk I suspect you’ll need to step through and reset the counter.

Now hopefully I’ve given you some stuff to get you started but there are far better awk/sed gurus than myself. So perhaps they’ll jump in and fill in the blanks. But I would start by getting your head around some basic regular expressions.

you can keep everything under 1 awk process


# awk 'BEGIN{RS="

"}/'"Elements with small"'/{for(i=1;i<=NF;i++)if($i+0==$i){print $i}}' file
16294
17889
16294
17889


Hehe the infamous Ghostdog74 I come across your replies quite often in regards to solving my own sed/awk problems.

I have to admit after typing it I did realise I could of combined them but I tend to use sed more, so was a little unsure how.

Curious how is that matching digits? I keep looking at it but seem to be missing it. I also noticed your NR seems more sensible than my own. In my limited eyes that says
for i<= NF(Increment and set) which my understanding is fields i.e columns
if $i+0==$i which will always match print the field.

But how come it misses the lines with alpha and —?

(I suspect this’ll help the OP as much as me learning something new :wink: )

not quite. if you do $i+0==0, it will just check that particular field for an integer.

Thanks ghostdog I had stumbled across a var being a string or integer depending on usage context makes more sense in the morning.

The way I understand this and please correct me if I’m wrong:

  1. collect all elements from the *.dat files and then remove these elements from the corresponding .inp file

Can these elements be collected from all the *.dat files in an array(via awk) and then remove from all the *.inp files?
is there a chance that a wrong element from 1.dat would be eventually removed wrongly from a 2.inp file?

Please let me know as I have some awk scripts which I can modify for you to do this in a very short time.
cheers

Hello,
thanks to everybody who is trying to help.

I’ll detail more my problem, in the hope it will help.

both awk code above don’t work properly with the files in full size (I apologize for not having provided this from the beginning)

Here is the full dat file where BAD elements are to be found. They are marked in bold orange

A.dat

1

   Abaqus 6.9-1                                  Date 28-Oct-2009   Time 10:59:04
 --------------------------------------------------------------------------------
 Distorted triangular elements
  
 Element   Quality measure     Min angle        Max angle     Adjusted nodes 
 --------- ---------------- ---------------- ---------------- -------------- 
       854        0.0285735          8.92741          154.775       NO       
      1369         0.119042          6.79394          122.446       NO       
      1598       0.00530384          5.69038          166.033       NO       
      4843        0.0117931          8.51097            161.9       NO       
      6113        0.0421485          7.56378           148.78       NO       
      6883       0.00531681          2.93227            163.4       NO       
      7690       0.00128284          2.85016             171.       NO       
      9384        0.0879691          6.16314          129.906       NO       
     11692       0.00676797          4.89223          164.197       NO       
     24915       0.00159596          2.70696          170.018       NO       
     27683        0.0436192          9.99342          150.589       NO       
     28762      0.000926279           1.3627          170.141       NO       
     29489        0.0989893          9.39215          136.032       NO       
     33151         0.028436          8.00116          154.229       NO       
     37586          0.22739           8.9118           97.164       NO       
     38469        0.0584867          8.64297          145.121       NO       
     40235       0.00296683          4.64652          168.501       NO       
     43414         0.178509          7.99049          109.666       NO       
     43558        0.0451444          6.26871          145.461       NO       
     45336         0.102684          7.73265           131.12       NO       
     53125        0.0020963           3.9702           169.71       NO       
     54010        0.0308723          8.02855          153.322       NO       
     54057        0.0803942          7.81497          137.525       NO       
     54243         0.120028          8.44665          128.703       NO       
    109775         0.116136          7.00128          124.404       NO       
    113514        0.0111456          4.28059           159.58       NO       
    114987         0.233988          9.53969          101.239       NO       
    122431         0.096072          6.56276          128.892       NO       
    122432         0.143899          8.04068          120.851       NO       
    122799      0.000787392          2.83589          172.576       NO       
    126496        0.0408083          9.98376          151.394       NO       
    129148        0.0126361          4.98988          159.431       NO       
    132272        0.0289734          5.28991          150.345       NO       
    133590        0.0988269          9.91401          136.969       NO       
    136378       0.00123149          3.52893          171.454       NO       
    136394        0.0636887          6.42976          139.169       NO       
    136885        0.0174814          8.38847          159.043       NO       
    137576        0.0275517          6.18311           152.63       NO       
    138203         0.157257          8.93597           120.98       NO       
    141792        0.0691473          8.84235          142.486       NO       
    142044        0.0435199          8.70057          149.617       NO       
    143088         0.141776           7.9458          121.031       NO       
    144956        0.0311378           9.1933          154.014       NO       
    146318         0.169376          8.82381          117.338       NO       
    148354        0.0224185           7.8812          156.606       NO       
    149910        0.0153567            9.155          160.187       NO       
    149946        0.0297652          8.24221          153.904       NO       
    152507        0.0285193          9.01808          154.842       NO       
    152807         0.213074           9.6235          109.163       NO       
    154788         0.259903          9.79209          85.7671       NO       
    156093         0.125212          8.81591          128.496       NO       
    156358        0.0529421          9.40505          147.523       NO       
    157085        0.0784916          9.26255          140.719       NO       
    158735        0.0360983          7.64655          151.024       NO       
    160665       0.00952519          6.82261          162.948       NO       
    162283         0.195645          7.33468          87.6524       NO       
    162818       0.00030213          1.88675          174.529       NO       
    165206         0.198405           9.2708          111.465       NO       
    165582        0.0854326          8.85228          138.314       NO       
    165676         0.249222          9.56834           93.183       NO       
    167198         0.139186           8.4139           123.66       NO       
    168851           0.1898          8.73034          110.888       NO       
    169674        0.0359503          7.25888          150.616       NO       
    170381         0.222759          9.60879          106.003       NO       
    171207        0.0285416          3.28161          143.623       NO       
    171464       0.00697484          5.02717          164.097       NO       
    171576         0.118197          4.98263           107.26       NO       
    176258         0.191241          8.28905          107.259       NO       
    176700       0.00490373          4.56343          165.947       NO       
    177782         0.228493          9.84865          105.778       NO       
    177962       0.00114734          3.75269          171.715       NO       
    179021        0.0244036          4.65751          151.332       NO       
    179413         0.129026          7.80306          124.043       NO       
  
  
 --------------------------------------------------------------------------------
 Elements with small area
  
  Element  Adjusted nodes 
 --------- -------------- 
     **439 **      NO       
     **440**	 NO	 



                            P R O B L E M   S I Z E


          NUMBER OF ELEMENTS IS                                 58687
          NUMBER OF NODES IS                                    40001
          NUMBER OF NODES DEFINED BY THE USER                   40001
          TOTAL NUMBER OF VARIABLES IN THE MODEL                40001
          (DEGREES OF FREEDOM PLUS ANY LAGRANGE MULTIPLIER VARIABLES)





          THE PROGRAM HAS DISCOVERED     1 FATAL ERRORS

               ** EXECUTION IS TERMINATED **



                              END OF USER INPUT PROCESSING



     JOB TIME SUMMARY
       USER TIME (SEC)      =   3.5300    
       SYSTEM TIME (SEC)    =  0.15000    
       TOTAL CPU TIME (SEC) =   3.6800    
       WALLCLOCK TIME (SEC) =          4

and here is A.inp to be modified (lines which begin with bad elements must be deleted).

*Node
439,0.132564,0.001137
440,0.134263,0.001069
478,0.134967,0.000611
479,0.135379,0.001142
480,0.137209,0.002212
481,0.136382,0.001698
482,0.137539,0.001231
483,0.136193,0.002135
484,0.135500,0.002459
485,0.135604,0.001725
486,0.134970,0.002100
487,0.141110,0.001470
488,0.141143,0.002361
489,0.139858,0.002298
490,0.143474,0.002265
491,0.142813,0.002073
492,0.143749,0.001748
493,0.141699,0.001913
494,0.141927,0.002361
495,0.142661,0.001041
496,0.142860,0.001350
497,0.142370,0.001045
498,0.142440,0.001492
*Element, type=DC2D3
432,477,478,479
433,480,481,482
434,483,484,485
435,486,485,484
436,487,488,489
437,490,491,492
438,493,488,487
439,494,488,493 --> line to be deleted
440,495,496,497 --> line to be deleted
441,491,498,496
442,496,498,497
443,493,499,494
444,500,487,501
445,500,493,487
446,493,500,499
447,499,500,498
448,502,503,504
449,505,506,507
450,508,507,509
451,506,509,507
452,510,511,512
453,510,506,505
454,510,513,506
455,514,505,515
*Elset, elset=C
484,490,514,515,545,547,554,555,558,760,764,765,772,774,780,789
792,794,807,810,812,813,814,816,817,819,903,904,907,922,923,925
439,440 --> put there intentionally for testing

as for part II (doing the above task recursively for file from A.inp to D.inp in folders from 01 to 03) I hope this will help:

/path/01/A-D.inp (4 files)
/path/02/A-D.inp (4 files)
/path/03/A-D.inp (4 files)

Thanks again. I apologize in advance for my limited prog skill/knowledge.

ah, maybe this wasn’t clear:

I have to delete only lines within “*Element, type=DC2D3” and “*Elset, elset=C”.

You can wait for the guys who know or keep learning as I am :wink:

Google really is your friend…
I’ve been in here a bit.
The AWK Manual - Table of Contents
Using your example

awk 'BEGIN{RS="--"}/'"P R O B L E M   S I Z E"'/{for(i=1;i<=NF;i++)if($i == "NO"){n=i-1;print $n}}' data.txt

RS Record Select so here I changed it to --, so the chunks were more manageable. Then found the record with P R O B L E M S I Z E in it. Then did the field matching and increment but here rather than matching for a digit as originally, this brought back too much. I choose “NO”(So that relies on Adjusted nodes always equalling NO) then I printed the field prior to the NO match.

I’m sure one of these guys will be way more elegant than myself, and from the sounds of things, dmera has something ready made.

Many Thanks FM!

I am almost done with the following code (thanks also to another no-awk guy here

#!/bin/bash

c=/path/
cd $c

#crop bad elements between "small area" and "P R O B L E M"
awk 'BEGIN{RS="--"}/'"P R O B L E M   S I Z E"'/{for(i=1;i<=NF;i++)if($i == "NO"){n=i-1;print $n}}' DAT.dat > BAD.dat


while read line
do
  P="$line"
  echo $P
  (echo "/Element, type"; echo ":/[0-9]"; echo "/$P/d";echo "wq!")|ex -s INPUT.inp

done < BAD.dat

There is only one problem left, when the BAD elements appear as second third or fourth item in the line–> in this case the line should not be deleted:

*Element, type=DC2D3
432,477,478,479
433,**439**,481,482 --> line NOT to be deleted
434,483,484,**440** --> line NOT to be deleted
435,486,485,484
436,487,488,489
437,490,491,492
438,493,488,487
**439**,494,488,493 --> line to be deleted
**440**,495,496,497 --> line to be deleted
441,491,498,496
442,496,498,497
443,493,499,494
444,500,487,501
445,500,493,487
446,493,500,499
447,499,500,498
448,502,503,504
449,505,506,507
450,508,507,509
451,506,509,507
452,510,511,512
453,510,506,505
454,510,513,506
455,514,505,515
*Elset, elset=C

Thanks! :slight_smile:

As I’m enjoying the learning curve …

You really are missing the beauty of awk (Bloody terrible syntax though)
There is no need for a while loop

awk 'BEGIN{FS=","}NR==FNR{a$1];next} $1 in a{print $0}' check.txt dat1.txt

OK the above still doesn’t do what you want but does show that there is no need to loop. The problem with the above is it will still bring back the early and later ones.

Now I just cant get my head around at combining the expressions hopefully one of the gurus will fix it. Even though I did manage to achieve what you first wanted without the 0 padding though :wink:
**(note)

for i in $(ls -l | egrep -v "^d|^l" | awk '{print $9}');
do awk 'BEGIN{str=01;RS="--"}/'"P R O B L E M   S I Z E"'/{{print "*Element"}
for(i=1;i<=NF;i++)
if($i == "NO"){n=i-1;print $n}
else if ($i=="YES"){n=i-1;print $n}{print "*Elset"}}' $i;
done

This is actually missing the count and the #, to get that… the var is there and just needs printing it also doesn’t presume just NO but also takes into account YES. If you change print $n to $n’#'str;str++ you’ll have the format you wanted(But then it won’t match for the delete), if you change “*Element” to FILENAME(Or even prior should work inside{}) it’ll append the filename to the top of the block but… this uses *EL as the record sets.

As mentioned I’ve had to crudely hack it with pipes hopefully someone will come along and tidy it up and I’ll learn something new.

awk 'BEGIN{FS=","}NR==FNR{a$1];next} $1 in a{print $0}' check.txt dat1.txt | awk 'BEGIN{RS="\*El";FS="
"}/'"type\="'/{print $0}' | awk '/^(:digit:]])/{print $0}'

Now personally I would then pipe this to sed -i as this’ll update them but carefully test and test again(Did I say test) using test data there will be no undo(OK maybe you’ll have a backup looking at the man).

**
The for loop won’t actually do as intended as you won’t have paths if you use the recursive flag, it looks like this would be one for find. So the for loop is rather superfluous and could be written as for i in .

edit
If you change the print to…

{print "*Element" "
" FILENAME}

You’ll have the filename in each block as to how you’ll use it I leave that one upto you.

Hello,
thanks for suggestion but I still prefer the echo “/string” style with loop. I solved my problem above just adding ^ before $P.
This code does exactly what I wanted but for just a single file:

#!/bin/bash

c=/path/
cd $c

#crop bad elements between "small area" and "P R O B L E M"
awk 'BEGIN{RS="--"}/'"P R O B L E M   S I Z E"'/{for(i=1;i<=NF;i++)if($i == "NO"){n=i-1;print $n}}' DAT.dat > BAD.dat


while read line
do
  P="$line"
  echo $P
  (echo "/Element, type"; echo ":/[0-9]"; echo "/**^**$P/d";echo "wq!")|ex -s INPUT.inp

done < BAD.dat

making this code recursive on several files is the next challenge now (part II)
Thanks!

Sorry that I couldn’t help earlier but sometimes I have to work at work.
Anyway is not what I would have like the script to be like (I was thinking of a awk complete solution, but time was not on my side.
Just to help get over with the last step here is the script you have and I added just a loop to do multiple files(assuming they are in the same directory.
#!/bin/bash

#c=/path/
#cd $c

#crop bad elements between “small area” and “P R O B L E M”
awk ‘BEGIN{RS="–"}/’“P R O B L E M S I Z E”’/{for(i=1;i<=NF;i++)if($i == “NO”){n=i-1;print $n}}’ A.dat > BAD.dat
all_files="/home/dan/progr_task/*.inp"
for the_file in $all_files; do
echo “$the_file is my file”
while read line
do
P="$line"
# echo $P
(echo “/Element, type”; echo “:/[0-9]”; echo “/^$P/d”;echo “wq!”)|ex -s $the_file

    done &lt; BAD.dat

done