It’s not me that needs convincing. The point is that it took a few posts from me to get you to this point, that you agree with me that it has to be measured, and not just assumed.
well, i shall do you this favor
# ls -1
file
# head -5 file
1 col1 col2 col3 col4
2 col1 col2 col3 col4
3 col1 col2 col3 col4
4 col1 col2 col3 col4
5 col1 col2 col3 col4
# tail -5 file
19996 col1 col2 col3 col4
19997 col1 col2 col3 col4
19998 col1 col2 col3 col4
19999 col1 col2 col3 col4
20000 col1 col2 col3 col4
# time while read a b c d; do echo $a; done < file > /dev/null
real 0m1.296s
user 0m1.212s
sys 0m0.084s
# time while read a b c d; do echo $a; done < file > /dev/null
real 0m1.327s
user 0m1.236s
sys 0m0.088s
# time awk '{print $1}' file > /dev/null
real 0m0.026s
user 0m0.024s
sys 0m0.000s
# time awk '{print $1}' file > /dev/null
real 0m0.027s
user 0m0.024s
sys 0m0.004s
# time perl -lane 'print @F[0];' file >/dev/null
real 0m0.217s
user 0m0.204s
sys 0m0.008s
# time perl -lane 'print @F[0];' file >/dev/null
real 0m0.221s
user 0m0.212s
sys 0m0.008s
Good, so you’ve proved it is less efficient on a large file. But for a 20000 line file, it only took just over 1 second. Does it matter in practice? That depends on the application. If you were going to do this task day after day, then you would want to make it efficient. Suppose you wanted to write it easily, or you expect to change the conditions, say you need to match not just blank lines, but also comment lines, you might choose awk or perl or python or ruby. But suppose you had an embedded router running Linux and you don’t have room on it for awk, let alone perl. In such a case it would be justified to do the task using what you have.
That’s all I’m saying really, take into account all the factors.
yes, it does. Whether its going to be a big file or small file, we don’t know. awk is efficient in parsing files as demonstrated, plus it has the advantage over bash’s while read loop over large files, therefore to make code “resilient” to future unforeseen changes, use the better method. why should one wait for things to happen before doing something?
As for your second paragraph (about the embedded router) Of course in such a case where awk/perl (i am also quite sure awk will be there(unless prehistoric router)) is unavailable, using what one have is inevitable. But aren’t your points going OT with regard to OP’s post? i am fairly sure he’s not going to do that from a router with only just bash.
Actually I’m surprised how good it is, 1.3 seconds for a 20000 line file is not bad. I suspect that the meme that sh is inefficient came from the days when expr and test were external programs. Now that they are built-ins, that slowdown is gone. I suspect that if someone were to put some effort into JIT compilation for bash, it would be as fast as the other programming languages. But since bash is mostly used for process flow control, nobody feels the urgency.
The programmer should ask what the size of the data is. Is it a config file that he’s filtering? Or some 20000 line financial printout? That would be one of the factors for deciding. Personally I feel that the execution time of 1.3 seconds is not significant, you would waste more time just thinking about it. For me it would whether the code is easy to develop and maybe maintain. So I would probably go for awk or perl anyway, for the factors you have stated.
Which is why I found the OP’s desire to use bash a bit strange. But maybe he just wanted the data processed, whatever the language.
note that my example only gets the first column. for more complex processing, its a different story.
The programmer should ask what the size of the data is. Is it a config file that he’s filtering?
while different people will have different views, i would think its not necessary, because as a good programmer, he should think of all possibilities and find the best way to go about it.
Well, the best way depends on the function of the script, and expected input, which is why there are so many alternatives. So yes, it’s necessary that a good programmer look at all the angles.
For instance here is another situation where you might want to do the processing in shell. If it’s a config file that is being parsed for program arguments before invoking the main program, if you were to do it in an external script the results would have to be passed back to the shell, usually using backticks. Backticks are problematic for multiple arguments. If the processing is done in shell, you can use the variables directly. IIRC the shell wrapper for pure-ftpd does this sort of thing.
I tried this one but it doesn’t fit to me because it prints only the first word of my lines but I get an response from another forum and with a little modification works just fine.
Here is the solution:
awk ‘/^$/{getline a; print a}’<file
Thank you all for interest. I really appreciate your help!
no need to use input redirection. just pass the file name to awk normally. Also the above will fail if you have more than 1 blank lines (ie /^$/ )
awk 'BEGIN{ RS=""}NR>1{ print $1}' file
If I am trying this way it will print only the first word on the line.I have to get all the words.
Change that $1 to $0.
Doesn’t work! The output is the hole file with the blank lines deleted.
Ah well, ask Ghostdog74 to fix the algorithm then, I’m not the author.
No problem! I’ve solved the issue. I posted the solution in one of my replies.
That was another answer to the problem but it didn’t work for me.
don’t just say “it doesn’t work”. Me or others are not psychics. we will not know what is going on at your side. you have to show how to you execute the script, what is in your script, show any error messages you encounter etc etc. any info that will help in your troubleshooting. DON’T just say its not working!
For what its worth, its working for me, according to what you require in your first post.
# cat file
aaa
111
bbb
222
333
ccc
# awk 'BEGIN{ RS=""}NR>1{ print $1}' filefile
bbb
ccc
Ah, I see what you two have done. He’s given you test data with only one word per line. You then wrote a program that split records on whitespace and output the first word. Unfortunately that doesn’t work on lines with multiple words, which he didn’t show you. I should have noticed that you changed RS in the first statement.
Try this:
awk 'BEGIN{RS="
+";FS="
"}{print $1}’ inputfile
Ironically this is one of the few places where awk is better than perl because in perl $/ cannot be a regex.
Sorry my fault!I posted just an example! my file was a little bit complicated.
I’ll try with the last post and if all is OK I’ll post back.
i notice in his first post, he wants aaa as well. then the NR>1 in my script will have to be removed. A case of blind leading the blind i guess …
This one works just fine! Again sorry that I didn’t put the original file and I said that it didn’t work.