Hi I am Rupesh from India and I want to download a website using wget for offline viewing I mean I want to mirror a website ie., want to maintain exact copy of the website in my hard-disk. I have installed opensuse leap 42.3 with wget and it’s GUI.
Previously I have downloaded the website using an offline browser called extreme picture finder. 90 % 0f the files of which I want have been successfully downloaded and so I want to download remaining 10 %.
I have read the manual page of wget and examined some of the tutorials found by searching web related to wget. I have tried what I found in tutorials and I am providing the output of those commands.
I have issued the command as below
wget -c -t 0 --recursive --force-directories -o logfile.txt ‐‐recursive ‐‐no-clobber ‐‐accept jpg,gif,png,jpeg,mp3,MP3,pdf
For the above command I got the output as below
idn_encode failed (-304): ‘string contains a disallowed character’
idn_encode failed (-304): ‘string contains a disallowed character’
--2017-09-29 18:08:43-- http://%E2%80%90%E2%80%90recursive/
Resolving ‐‐recursive (‐‐recursive)... failed: Name or service not known.
wget: unable to resolve host address ‘‐‐recursive’
idn_encode failed (-304): ‘string contains a disallowed character’
idn_encode failed (-304): ‘string contains a disallowed character’
--2017-09-29 18:08:43-- http://%E2%80%90%E2%80%90no-clobber/
Resolving ‐‐no-clobber (‐‐no-clobber)... failed: Name or service not known.
wget: unable to resolve host address ‘‐‐no-clobber’
idn_encode failed (-304): ‘string contains a disallowed character’
idn_encode failed (-304): ‘string contains a disallowed character’
--2017-09-29 18:08:43-- http://%E2%80%90%E2%80%90accept/
Resolving ‐‐accept (‐‐accept)... failed: Name or service not known.
wget: unable to resolve host address ‘‐‐accept’
--2017-09-29 18:08:43-- http://jpg,gif,png,jpeg,mp3,mp3,pdf/
Resolving jpg,gif,png,jpeg,mp3,mp3,pdf (jpg,gif,png,jpeg,mp3,mp3,pdf)... failed: Name or service not known.
wget: unable to resolve host address ‘jpg,gif,png,jpeg,mp3,mp3,pdf’
idn_encode failed (-304): ‘string contains a disallowed character’
idn_encode failed (-304): ‘string contains a disallowed character’
--2017-09-29 18:08:43-- http://%E2%80%90%E2%80%90directory-prefix=/mnt/source/downloads/lectures/
Resolving ‐‐directory-prefix= (‐‐directory-prefix=)... failed: Name or service not known.
wget: unable to resolve host address ‘‐‐directory-prefix=’
--2017-09-29 18:08:43-- http://www.pravachanam.com/categorybrowselist/20
Resolving www.pravachanam.com (www.pravachanam.com)... 162.144.54.142
Connecting to www.pravachanam.com (www.pravachanam.com)|162.144.54.142|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘www.pravachanam.com/categorybrowselist/20’
0K .......... .......... .......... .......... .......... 31.3K
50K .......... .... 1.54M=1.6s
2017-09-29 18:08:46 (40.0 KB/s) - ‘www.pravachanam.com/categorybrowselist/20’ saved [65802]
Loading robots.txt; please ignore errors.
--2017-09-29 18:08:46-- http://www.pravachanam.com/robots.txt
Reusing existing connection to www.pravachanam.com:80.
HTTP request sent, awaiting response... 404 Not Found
2017-09-29 18:08:48 ERROR 404: Not Found.
--2017-09-29 18:08:48-- http://www.pravachanam.com/sites/default/files/favicon.ico
Reusing existing connection to www.pravachanam.com:80.
HTTP request sent, awaiting response... 404 Not Found
2017-09-29 18:08:51 ERROR 404: Not Found.
--2017-09-29 18:08:51-- http://www.pravachanam.com/modules/system/system.base.css?owgg5m
Reusing existing connection to www.pravachanam.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 5428 (5.3K) [text/css]
Saving to: ‘www.pravachanam.com/modules/system/system.base.css?owgg5m’
0K ..... 100% 16.2K=0.3s
2017-09-29 18:08:51 (16.2 KB/s) - ‘www.pravachanam.com/modules/system/system.base.css?owgg5m’ saved [5428/5428]
--2017-09-29 18:08:51-- http://www.pravachanam.com/modules/system/system.menus.css?owgg5m
Reusing existing connection to www.pravachanam.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 2035 (2.0K) [text/css]
Saving to: ‘www.pravachanam.com/modules/system/system.menus.css?owgg5m’
0K . 100% 236K=0.008s
2017-09-29 18:08:52 (236 KB/s) - ‘www.pravachanam.com/modules/system/system.menus.css?owgg5m’ saved [2035/2035]
--2017-09-29 18:08:52-- http://www.pravachanam.com/modules/system/system.messages.css?owgg5m
Reusing existing connection to www.pravachanam.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 961 [text/css]
Saving to: ‘www.pravachanam.com/modules/system/system.messages.css?owgg5m’
0K 100% 255M=0s
2017-09-29 18:08:52 (255 MB/s) - ‘www.pravachanam.com/modules/system/system.messages.css?owgg5m’ saved [961/961]
--2017-09-29 18:08:52-- http://www.pravachanam.com/modules/system/system.theme.css?owgg5m
Reusing existing connection to www.pravachanam.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 3711 (3.6K) [text/css]
Saving to: ‘www.pravachanam.com/modules/system/system.theme.css?owgg5m’
0K ... 100% 374K=0.01s
2017-09-29 18:08:52 (374 KB/s) - ‘www.pravachanam.com/modules/system/system.theme.css?owgg5m’ saved [3711/3711]
--2017-09-29 18:08:52-- http://www.pravachanam.com/sites/all/libraries/mediaelement/build/mediaelementplayer.min.css?owgg5m
Reusing existing connection to www.pravachanam.com:80.
HTTP request sent, awaiting response... 404 Not Found
2017-09-29 18:08:54 ERROR 404: Not Found.
--2017-09-29 18:08:54-- http://www.pravachanam.com/sites/all/modules/views_slideshow/views_slideshow.css?owgg5m
Reusing existing connection to www.pravachanam.com:80.
HTTP request sent, awaiting response... 404 Not Found
2017-09-29 18:08:56 ERROR 404: Not Found.
--2017-09-29 18:08:56-- http://www.pravachanam.com/modules/comment/comment.css?owgg5m
Reusing existing connection to www.pravachanam.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 184 [text/css]
Saving to: ‘www.pravachanam.com/modules/comment/comment.css?owgg5m’
On examining the above output we can clearly guess that wget is treating options as website addressees.
After that I have issued the command as below
wget ‐‐level=1 ‐‐recursive ‐‐no-parent ‐‐no-clobber ‐‐accept mp3,MP3 http://www.pravachanam.com/categorybrowselist/20
On executing the above command it has created outfile.txt file and a directory called www.pravachanam.com under my current directory. wget has created some directories but not same as the source website I mean it has not maintained the directory structure same as source website.
In the outfile.txt I have found some lines ending with .mp3 and I have tried to examined the corresponding file in the directory created by wget but failed to locate the file and even failed to directory structure related to mp3 file.
I have installed and tried gwget which is the gnomes GUI for wget and in that I have tried a number of options or settings but it has failed to download I mean it has downloaded the home page and then stopped and after that it has displayed message as successfully completed downloading the website. In the GUI version of wget there is no options for selecting all the options found in the command line version of wget.
Please try suggest how to download mp3 files from a website with the following options using wget.
1)option for maintaining directory structure same as source website.
2)option for rejecting download of already downloaded files I mean skip those.
3)As I want to download all the mp3 files except the folders and files containing some words like xyz and so can you suggest how to skip download if the files or folders contain xyz in their names.
4) option to download files recursively and not to visit other website’s.
5) option to try downloading files infinitely in the case of network failure.
6) option to resume download the files which are downloaded partially previously.
7) option to download only mp3 and reject all other file types if possible including html,php,css files.
Many of you may suggest that try to the manual page of wget and experiment on your own but taking advice and help from expert people like you is the signal to success. At present I am also reading the manuals and guides of wget but the help provided by you is most valuable. I am requesting as many people as to reply to this thread and help me.
Regards,
Rupesh.