I have 2 folders each consisting different version of html pages. Both have same structure, and uses same names.
I easily can find differences of both files using “diff”, but what I actually want to do is not compare whole documents instead only some parts in the files. Like this;
In all of the html files, there is this line:
<!-- START CONTENT →
End some other lines, then,
<!-- END CONTENT →
I want to compare only the lines in between this lines.
What I have in mind is that, copying “content section” of each version to a different file, compare that files, and then remove them.
But, unfortunately, I don’t know how to get certain part of a file. Anyone know how to do it?
I have 2 different folders: “user_guide-1.7.0” and “user_guide-2.0.0”
in them, all html files, have different versions.
I want to find differences, but I am not concerned about the differences in the header sections of files, for footer information etc.
What I want to achieve is that, to get difference in the parts of files, starting with
<!-- START CONTENT →
end, ending with
<!-- END CONTENT →
To be able to use diff program according to my purpose, I somehow need to pull only that part of files in both versions, and only compare the parts that I pulled. Only thing that I do not know how to do it is to extract related parts from the html documents.
So my question can also be changed to this:
How do I get parts starting with: <!-- START CONTENT →
end, ending with: <!-- END CONTENT →
You could look into the csplit tool (of course* man csplit*). It can split a file according to a pattern found. In your case the original pairs of files should be split into 3 pairs of files, where you can then do the diff on the second parts. Throw them away before going for a new pair.