Cases “a”, “a*”, “c[a-z]” and “b[a-z]” seem understandable for me. But I don’t get at all why the outputs for the rest of the cases… Why is “c[a-z]" output both words all highlighted? Why is "b[a-z]” output null? Why are all + and ? related regexes outputs null?
You’re used to using regular expressions, I’m guessing, in languages like
Perl where Perl Compatible Regular Expressions (PCRE) are the norm.
‘grep’, while able to handle similar regexes, has different defaults with
regard to which special characters have special meaning by default.
In order to get the results you want with any shell command, I have at
least one big recommendation: always use quotes. If possible, including
in this case, use single-quotes. The reason is that it ensures that all
characters are passed as parameters rather than being interpreted by the
shell first for special meaning. This may not be a problem in this case,
but it’s very easy to get odd results due to something from the shell
before passing a parameter or argument to a command.
Also in your case the difference in interpretation of regex character
meanings is what is throwing you off. Escape some of the specials (’+’
and ‘?’ in particular) and you’ll see the results you expect. For an
example that shows this another way, have lines in your test file with the
following data:
ca+
ca?
–
Good luck.
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…
Depending on what’s in your current directory, an unquoted “b[a-z]*” might match a filename there, and then that filename is what “grep” sees as an argument.
Thanks, the quotes in the b[a-z]* case did the trick. Now I get indeed something similar to the c[a-z]* case: all letters after the first matched b or c highlighted.
What I still don’t understand is how the regex is exactly working.
Actually, I came here for help understanding regex. As I said, I’m just starting to study them for a short-term work I’ll need to do later. I’ve already read documentation and tutorials, but for some reason it seems they were not enough. For example, I -barely- understood anchors (^ and $) are for start and end of a line respectively, also about character classes -or sets-, etc. But I failed to understand everywhere how repeaters work (?, * and +). I read “? attempts to match 0 or 1 time; * attempts 0 or more; + attempts 1 or more”. Ok, then in my examples I’d expect some matches in all the cases using ? and +, but there’s none! WTH?
I even tried testing regex here. I tried entering the word “daebe”, then entering the regex “a”. Matched one ‘a’. OK. Then “a?”. No match!? Just the cursor put before the first character of the word!? How are repeaters, including * even if it’s indeed doing something for me, working then?
Also, what’s the difference between simple and double quotes, or in which cases should each one be used?
Thanks.
As far as I know, the “?” regular expression matches only the “?” character.
You are possibly confused between regular expression pattern matching, and shell pattern matching. The “?” in a shell command matches one character.
As for single or double quoting - the quoting is a shell thing. The shell looks at your command line, then removes quotes and passes the result to the command (such as the “grep” command).
Most of the time “double quotes” and ‘single quotes’ do about the same. The shell just removes the quotes and does not use anything as having a special meaning. But there are some characters where the shell makes a distinction. And maybe that also depends on which shell.
Read some more and finally understood further in general. I had to understand grep indeed highlights all matches of a regex at once, but distinguishes no individual matches.
Tried adding some special characters to the “test” file, such as “.” and “^”. Tried regex “.]” and it did match the dots I had in the file. So I though by putting special characters inside brackets they’d be treated as literals, just like a character class (or set). So tried with regex “^]”, got this:
user@linux-loyv:~> grep -E '^]' ~/Downloads/test
grep: or ^ unpaired
I had to make that regex into “^]” to make it match all the “^” in the file. Curious thing is, “.]” and “.]” seemingly gave me the same result (though there are no “” characters in the file).
if you put the dot inside the ] or escape it then it becomes literal. You might need to escape the \ for it to become literal.
man 7 regex
it is a good way start if you have it. by default grep use BRE and some characters are special to BRE that are not special to ERE so you need to escape them.
Shell quoting is also important to understand when doing regexp. Also careful with the regexp you find in the interwebz some of them might be PCRE.
Help an uncle with some admin (or listing…) tasks. For example, finding files with certain modification dates, mails of which senders’ last names begin with certain letters, finding file names that have only certain letters or characters, etc. Or so I understood.
Just remember if it involves your favorite windows os or files created/written in a windows program then you might encounter windows line endings aka CR. Other than that good luck with the admin job