automatically removing unnecessary line breaks on a text document

I made a small script to help me keep my torrent blocklists up to date. The script simply downloads all the lists I want and merges them in a single text document.
The problem is; the resulting document contains many many line breaks to the point where I think it makes my torrent client stop reading after a certain point. Resulting in a great deal of addresses not being loaded and blocked.

 
#! /bin/bash
cd /home/suser/Documents/lists/
wget -O list0.gz http://list.iblocklist.com/?list=bt_level1&fileformat=p2p&archiveformat=gz
wget -O list1.gz http://list.iblocklist.com/?list=bt_level2&fileformat=p2p&archiveformat=gz
wget -O list2.gz http://list.iblocklist.com/?list=bt_level3&fileformat=p2p&archiveformat=gz
wget -O list3.gz http://list.iblocklist.com/?list=bt_spyware&fileformat=p2p&archiveformat=gz
wget -O list4.gz http://list.iblocklist.com/?list=bt_hijacked&fileformat=p2p&archiveformat=gz

wait
gzip -d list0.gz list1.gz list2.gz list3.gz list4.gz

wait
 paste -d '
' list0 list1 list2 list3 list4 > mergedlist

wait
rm list0 list1 list2 list3 list4
 

By default “paste” inserts each line of each document separated with a tab. That made my torrent client recognize even less addresses. by making “paste” insert a newline in between, instead of a tab, things improved. But by the time lists 3 and 4 are over, (they are shorter) paste continues inserting newlines. the result is something like this:

beginning of the file:



# List distributed by iblocklist.com
# List distributed by iblocklist.com
# List distributed by iblocklist.com
# List distributed by iblocklist.com
Hijacked IP Block(SH):;-




Hijacked IP Block(SH):2.56.0.0-2.59.255.255
Detected AP2P on Beijing Gehua CATV:1.88.185.213-1.88.185.213
Beijing Gehua CATV Network Co., Ltd:1.88.0.0-1.91.255.255
Associazione Amici dei Bambini:2.116.68.136-2.116.68.143
I-Deal Direct Interactive, Inc:4.21.117.128-4.21.117.159
Hijacked IP Block(SH):14.192.0.0-14.192.31.255
Detected AP2P on Chunghwa Telecom proxy:1.161.131.134-1.161.131.134
Beijing Teletron Telecom Engineering Co., Ltd:1.92.0.0-1.93.255.255
Information Technology Company (ITC):2.179.128.0-2.179.143.255
I-Deal Direct Interactive, Inc:4.21.149.0-4.21.149.31
Hijacked IP Block(SH):14.192.48.0-14.192.59.255
Detected AP2P on SK Broadband proxy:1.226.51.218-1.226.51.218
Telecom Italia Business:2.112.0.0-2.119.255.255
The Boston Consulting Group:4.2.225.224-4.2.225.231
MicroStrategy:4.43.44.32-4.43.44.63
 
 

Everything looks great. But by the end of the file it looks like:


mgm home entertainment:80.127.110.176-80.127.110.191




Duijnstee van der Wilk Advocaten:80.127.112.176-80.127.112.183




Van Korlaar advocaten:80.127.112.200-80.127.112.207




Siemens Nederland NV:80.127.113.0-80.127.113.15




Atos Origin:80.127.114.8-80.127.114.15

Is there a command that will help me remove all the unnecessary newlines ?
or perhaps avoid creating them in the first place.

Thank you.

Are these empty lines? Or do they contain white psave (space and tab characters). When they are empty:

grep -v '^$' the-file

I think they are completely empty. Since that command dumps the whole file on the console without any newlines or any space/tab characters.

I just dumped the result to another file and problem solved.


grep -v '^$' mergedList > mergedList2

THANK YOU VERY MUCH !!!

editL could you be so kind and explain what ‘^$’ means ??

What is between the ’ and ’ is a PATTERN, see:

man grep

In a pattern, the ^ anchors the following string (that should match) to the beginning of the line. Thus

grep '^aap'

searches for lines that start with the string aap.

The $ anchors the string that precedes the the end of the line. Thus

grep 'noot$'

searches for line that end with the string noot.

Thus

grep '^mies$'

searches for line that only exist of the string mies.

And what you do is searching for strings that have nothing between start and end: empty lines.

I understand now, you look for lines that match the pattern; this pattern being empty strings ‘^$’.
You then invert the match with “-v”, so that grep prints the non matching lines instead, which are all the “populated” lines.

That’s clever !

Thank you.

That is living the Unix/Linux way of life. Full of combining tiny bits into usefull sequencies lol!