[BASH] Partial difference of files

Hi,

I have 2 folders each consisting different version of html pages. Both have same structure, and uses same names.

I easily can find differences of both files using “diff”, but what I actually want to do is not compare whole documents instead only some parts in the files. Like this;

In all of the html files, there is this line:

<!-- START CONTENT →

End some other lines, then,

<!-- END CONTENT →

I want to compare only the lines in between this lines.

What I have in mind is that, copying “content section” of each version to a different file, compare that files, and then remove them.

But, unfortunately, I don’t know how to get certain part of a file. Anyone know how to do it?

You are not making much sense to me.

You say:

copying “content section” of each version to a different file, compare that files, and then remove them.

But, unfortunately, I don’t know how to get certain part of a file. Anyone know how to do it?

Are you wanting to do that in Bash?

Let me explain in more detail;

I have 2 different folders: “user_guide-1.7.0” and “user_guide-2.0.0”

in them, all html files, have different versions.

I want to find differences, but I am not concerned about the differences in the header sections of files, for footer information etc.

What I want to achieve is that, to get difference in the parts of files, starting with

<!-- START CONTENT →
end, ending with
<!-- END CONTENT →

To be able to use diff program according to my purpose, I somehow need to pull only that part of files in both versions, and only compare the parts that I pulled. Only thing that I do not know how to do it is to extract related parts from the html documents.

So my question can also be changed to this:

How do I get parts starting with: <!-- START CONTENT →
end, ending with: <!-- END CONTENT →

sed -n '/<!-- START CONTENT -->/,/<!-- END CONTENT -->/p' yourfile

Someone here may know, but it ain’t me.

You could look into the csplit tool (of course* man csplit*). It can split a file according to a pattern found. In your case the original pairs of files should be split into 3 pairs of files, where you can then do the diff on the second parts. Throw them away before going for a new pair.

HTH

EDIT: and of course sed

It worked like a charm :slight_smile: Thank you!