Page 1 of 5 123 ... LastLast
Results 1 to 10 of 42

Thread: using wget as an offline browser to download all mp3 files from a website.

  1. #1
    Join Date
    Aug 2011
    Location
    India
    Posts
    278

    Default using wget as an offline browser to download all mp3 files from a website.

    Hi I am Rupesh from India and I want to download a website using wget for offline viewing I mean I want to mirror a website ie., want to maintain exact copy of the website in my hard-disk. I have installed opensuse leap 42.3 with wget and it's GUI.

    Previously I have downloaded the website using an offline browser called extreme picture finder. 90 % 0f the files of which I want have been successfully downloaded and so I want to download remaining 10 %.

    I have read the manual page of wget and examined some of the tutorials found by searching web related to wget. I have tried what I found in tutorials and I am providing the output of those commands.

    I have issued the command as below

    Code:
    wget -c -t 0 --recursive --force-directories   -o logfile.txt ‐‐recursive ‐‐no-clobber ‐‐accept jpg,gif,png,jpeg,mp3,MP3,pdf

    For the above command I got the output as below

    Code:
    idn_encode failed (-304): ‘string contains a disallowed character’
    idn_encode failed (-304): ‘string contains a disallowed character’
    --2017-09-29 18:08:43--  http://%E2%80%90%E2%80%90recursive/
    Resolving ‐‐recursive (‐‐recursive)... failed: Name or service not known.
    wget: unable to resolve host address ‘‐‐recursive’
    idn_encode failed (-304): ‘string contains a disallowed character’
    idn_encode failed (-304): ‘string contains a disallowed character’
    --2017-09-29 18:08:43--  http://%E2%80%90%E2%80%90no-clobber/
    Resolving ‐‐no-clobber (‐‐no-clobber)... failed: Name or service not known.
    wget: unable to resolve host address ‘‐‐no-clobber’
    idn_encode failed (-304): ‘string contains a disallowed character’
    idn_encode failed (-304): ‘string contains a disallowed character’
    --2017-09-29 18:08:43--  http://%E2%80%90%E2%80%90accept/
    Resolving ‐‐accept (‐‐accept)... failed: Name or service not known.
    wget: unable to resolve host address ‘‐‐accept’
    --2017-09-29 18:08:43--  http://jpg,gif,png,jpeg,mp3,mp3,pdf/
    Resolving jpg,gif,png,jpeg,mp3,mp3,pdf (jpg,gif,png,jpeg,mp3,mp3,pdf)... failed: Name or service not known.
    wget: unable to resolve host address ‘jpg,gif,png,jpeg,mp3,mp3,pdf’
    idn_encode failed (-304): ‘string contains a disallowed character’
    idn_encode failed (-304): ‘string contains a disallowed character’
    --2017-09-29 18:08:43--  http://%E2%80%90%E2%80%90directory-prefix=/mnt/source/downloads/lectures/
    Resolving ‐‐directory-prefix= (‐‐directory-prefix=)... failed: Name or service not known.
    wget: unable to resolve host address ‘‐‐directory-prefix=’
    --2017-09-29 18:08:43--  http://www.pravachanam.com/categorybrowselist/20
    Resolving www.pravachanam.com (www.pravachanam.com)... 162.144.54.142
    Connecting to www.pravachanam.com (www.pravachanam.com)|162.144.54.142|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: unspecified [text/html]
    Saving to: ‘www.pravachanam.com/categorybrowselist/20’
    
         0K .......... .......... .......... .......... .......... 31.3K
        50K .......... ....                                        1.54M=1.6s
    
    2017-09-29 18:08:46 (40.0 KB/s) - ‘www.pravachanam.com/categorybrowselist/20’ saved [65802]
    
    Loading robots.txt; please ignore errors.
    --2017-09-29 18:08:46--  http://www.pravachanam.com/robots.txt
    Reusing existing connection to www.pravachanam.com:80.
    HTTP request sent, awaiting response... 404 Not Found
    2017-09-29 18:08:48 ERROR 404: Not Found.
    
    --2017-09-29 18:08:48--  http://www.pravachanam.com/sites/default/files/favicon.ico
    Reusing existing connection to www.pravachanam.com:80.
    HTTP request sent, awaiting response... 404 Not Found
    2017-09-29 18:08:51 ERROR 404: Not Found.
    
    --2017-09-29 18:08:51--  http://www.pravachanam.com/modules/system/system.base.css?owgg5m
    Reusing existing connection to www.pravachanam.com:80.
    HTTP request sent, awaiting response... 200 OK
    Length: 5428 (5.3K) [text/css]
    Saving to: ‘www.pravachanam.com/modules/system/system.base.css?owgg5m’
    
         0K .....                                                 100% 16.2K=0.3s
    
    2017-09-29 18:08:51 (16.2 KB/s) - ‘www.pravachanam.com/modules/system/system.base.css?owgg5m’ saved [5428/5428]
    
    --2017-09-29 18:08:51--  http://www.pravachanam.com/modules/system/system.menus.css?owgg5m
    Reusing existing connection to www.pravachanam.com:80.
    HTTP request sent, awaiting response... 200 OK
    Length: 2035 (2.0K) [text/css]
    Saving to: ‘www.pravachanam.com/modules/system/system.menus.css?owgg5m’
    
         0K .                                                     100%  236K=0.008s
    
    2017-09-29 18:08:52 (236 KB/s) - ‘www.pravachanam.com/modules/system/system.menus.css?owgg5m’ saved [2035/2035]
    
    --2017-09-29 18:08:52--  http://www.pravachanam.com/modules/system/system.messages.css?owgg5m
    Reusing existing connection to www.pravachanam.com:80.
    HTTP request sent, awaiting response... 200 OK
    Length: 961 [text/css]
    Saving to: ‘www.pravachanam.com/modules/system/system.messages.css?owgg5m’
    
         0K                                                       100%  255M=0s
    
    2017-09-29 18:08:52 (255 MB/s) - ‘www.pravachanam.com/modules/system/system.messages.css?owgg5m’ saved [961/961]
    
    --2017-09-29 18:08:52--  http://www.pravachanam.com/modules/system/system.theme.css?owgg5m
    Reusing existing connection to www.pravachanam.com:80.
    HTTP request sent, awaiting response... 200 OK
    Length: 3711 (3.6K) [text/css]
    Saving to: ‘www.pravachanam.com/modules/system/system.theme.css?owgg5m’
    
         0K ...                                                   100%  374K=0.01s
    
    2017-09-29 18:08:52 (374 KB/s) - ‘www.pravachanam.com/modules/system/system.theme.css?owgg5m’ saved [3711/3711]
    
    --2017-09-29 18:08:52--  http://www.pravachanam.com/sites/all/libraries/mediaelement/build/mediaelementplayer.min.css?owgg5m
    Reusing existing connection to www.pravachanam.com:80.
    HTTP request sent, awaiting response... 404 Not Found
    2017-09-29 18:08:54 ERROR 404: Not Found.
    
    --2017-09-29 18:08:54--  http://www.pravachanam.com/sites/all/modules/views_slideshow/views_slideshow.css?owgg5m
    Reusing existing connection to www.pravachanam.com:80.
    HTTP request sent, awaiting response... 404 Not Found
    2017-09-29 18:08:56 ERROR 404: Not Found.
    
    --2017-09-29 18:08:56--  http://www.pravachanam.com/modules/comment/comment.css?owgg5m
    Reusing existing connection to www.pravachanam.com:80.
    HTTP request sent, awaiting response... 200 OK
    Length: 184 [text/css]
    Saving to: ‘www.pravachanam.com/modules/comment/comment.css?owgg5m’
    On examining the above output we can clearly guess that wget is treating options as website addressees.

    After that I have issued the command as below

    Code:
    wget ‐‐level=1 ‐‐recursive ‐‐no-parent ‐‐no-clobber   ‐‐accept mp3,MP3  http://www.pravachanam.com/categorybrowselist/20
    On executing the above command it has created outfile.txt file and a directory called www.pravachanam.com under my current directory. wget has created some directories but not same as the source website I mean it has not maintained the directory structure same as source website.

    In the outfile.txt I have found some lines ending with .mp3 and I have tried to examined the corresponding file in the directory created by wget but failed to locate the file and even failed to directory structure related to mp3 file.

    I have installed and tried gwget which is the gnomes GUI for wget and in that I have tried a number of options or settings but it has failed to download I mean it has downloaded the home page and then stopped and after that it has displayed message as successfully completed downloading the website. In the GUI version of wget there is no options for selecting all the options found in the command line version of wget.


    Please try suggest how to download mp3 files from a website with the following options using wget.

    1)option for maintaining directory structure same as source website.
    2)option for rejecting download of already downloaded files I mean skip those.
    3)As I want to download all the mp3 files except the folders and files containing some words like xyz and so can you suggest how to skip download if the files or folders contain xyz in their names.
    4) option to download files recursively and not to visit other website's.
    5) option to try downloading files infinitely in the case of network failure.
    6) option to resume download the files which are downloaded partially previously.
    7) option to download only mp3 and reject all other file types if possible including html,php,css files.

    Many of you may suggest that try to the manual page of wget and experiment on your own but taking advice and help from expert people like you is the signal to success. At present I am also reading the manuals and guides of wget but the help provided by you is most valuable. I am requesting as many people as to reply to this thread and help me.

    Regards,
    Rupesh.

  2. #2
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    32,322
    Blog Entries
    15

    Default Re: using wget as an offline browser to download all mp3 files from a website.

    Hi
    Install and use httrack which is in the openSUSE Leap 42.3 release;

    Code:
    zypper in httrack
    http://www.httrack.com/
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  3. #3
    Join Date
    Aug 2011
    Location
    India
    Posts
    278

    Default Re: using wget as an offline browser to download all mp3 files from a website.

    I have used a number of offline browser's previously including httrack and all of them behaved same I mean they do not have the following capabilities
    1. Unable to resume the download when the internet is alive I mean they stop downloading when the internet connection has been dropped.
    2. They don't download the files which are previously partially downloaded I mean they do not try to download the remaining part of the file which was partially downloaded.

    Wget do not have the above mentioned drawbacks and so I am preferring to use it instead of others.
    Regards,
    Rupesh.

  4. #4
    Join Date
    Aug 2011
    Location
    India
    Posts
    278

    Default Re: using wget as an offline browser to download all mp3 files from a website.

    At present I have issued the command wget as below and I am providing the exact command with options and also some of its output below.

    Code:
    linux-ps66:~ # wget -c -t 0 -v --recursive --force-directories ‐‐recursive ‐‐no-clobber ‐‐accept jpg,gif,png,jpeg,mp3,MP3,pdf  ‐‐directory-prefix=/mnt/source/downloads/lectures/   http://www.pravachanam.com/categorybrowselist/20q
    idn_encode failed (-304): ‘string contains a disallowed character’
    idn_encode failed (-304): ‘string contains a disallowed character’
    --2017-10-02 00:19:52--  http://%E2%80%90%E2%80%90recursive/
    Resolving ‐‐recursive (‐‐recursive)... failed: Name or service not known.
    wget: unable to resolve host address ‘‐‐recursive’
    idn_encode failed (-304): ‘string contains a disallowed character’
    idn_encode failed (-304): ‘string contains a disallowed character’
    --2017-10-02 00:19:52--  http://%E2%80%90%E2%80%90no-clobber/
    Resolving ‐‐no-clobber (‐‐no-clobber)... failed: Name or service not known.
    wget: unable to resolve host address ‘‐‐no-clobber’
    idn_encode failed (-304): ‘string contains a disallowed character’
    idn_encode failed (-304): ‘string contains a disallowed character’
    --2017-10-02 00:19:52--  http://%E2%80%90%E2%80%90accept/
    Resolving ‐‐accept (‐‐accept)... failed: Name or service not known.
    wget: unable to resolve host address ‘‐‐accept’
    --2017-10-02 00:19:52--  http://jpg,gif,png,jpeg,mp3,mp3,pdf/
    Resolving jpg,gif,png,jpeg,mp3,mp3,pdf (jpg,gif,png,jpeg,mp3,mp3,pdf)... failed: Name or service not known.
    wget: unable to resolve host address ‘jpg,gif,png,jpeg,mp3,mp3,pdf’
    idn_encode failed (-304): ‘string contains a disallowed character’
    idn_encode failed (-304): ‘string contains a disallowed character’
    --2017-10-02 00:19:52--  http://%E2%80%90%E2%80%90directory-prefix=/mnt/source/downloads/lectures/
    Resolving ‐‐directory-prefix= (‐‐directory-prefix=)... failed: Name or service not known.
    wget: unable to resolve host address ‘‐‐directory-prefix=’
    --2017-10-02 00:19:52--  http://www.pravachanam.com/categorybrowselist/20q
    Resolving www.pravachanam.com (www.pravachanam.com)... 162.144.54.142
    Connecting to www.pravachanam.com (www.pravachanam.com)|162.144.54.142|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: unspecified [text/html]
    Saving to: ‘www.pravachanam.com/categorybrowselist/20q’
    
    www.pravachanam.com/categorybrowselis     [   <=>                                                                  ]  64.28K  97.2KB/s    in 0.7s    
    
    2017-10-02 00:19:55 (97.2 KB/s) - ‘www.pravachanam.com/categorybrowselist/20q’ saved [65824]
    From the above I have found the word failed mostly. I have also found the sentence "string contains a disallowed character". What does these mean. At present the tool is trying to download something and previously even this was not happened.

    Please try to examine the above code and suggest a way not to get any other errors.
    Regards,
    Rupesh.

  5. #5
    Join Date
    Aug 2011
    Location
    India
    Posts
    278

    Default Re: using wget as an offline browser to download all mp3 files from a website.

    At present after sometime I have examined the directory /mnt/source/downloads/lectures/ but found no contents at all ie., it is empty. Where the wget is storing the downloaded files.
    Regards,
    Rupesh.

  6. #6
    Join Date
    Sep 2012
    Posts
    7,093

    Default Re: using wget as an offline browser to download all mp3 files from a website.

    Quote Originally Posted by rupeshforu3 View Post
    Code:
    linux-ps66:~ # wget -c -t 0 -v --recursive --force-directories ‐‐recursive ‐‐no-clobber ‐‐accept jpg,gif,png,jpeg,mp3,MP3,pdf  ‐‐directory-prefix=/mnt/source/downloads/lectures/   http://www.pravachanam.com/categorybrowselist/20q
    idn_encode failed (-304): ‘string contains a disallowed character’
    idn_encode failed (-304): ‘string contains a disallowed character’
    --2017-10-02 00:19:52--  http://%E2%80%90%E2%80%90recursive/
    Resolving ‐‐recursive (‐‐recursive)... failed: Name or service not known.
    ...
    Check that your options start with double dash and not with characters that look like double dash (common issue when copy-pasting something). In your case you are using UNICODE U+2010 (HYPHEN) character instead of plain ole ASCII 0x2d.

  7. #7
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    29,738

    Default Re: using wget as an offline browser to download all mp3 files from a website.

    Sorry, I tried to post this yesterday, but something went wrong. Now additional to what @avidjaar says. Below you can see that sometimes you do not type (or otherwise insert) - in your command, but something different. And when you do that often, it could very well explain a lot of other problems you encounter.

    I copied your command from the post above:
    Code:
    wget -c -t 0 -v --recursive --force-directories ‐‐recursive ‐‐no-clobber ‐‐accept jpg,gif,png,jpeg,mp3,MP3,pdf  ‐‐directory-prefix=/mnt/source/downloads/lectures/   http://www.pravachanam.com/categorybrowselist/20q
    and put it in a file. When I list all characters of the file with od, it shows that there are a lot of strange characters in the command:
    Code:
    0000000   w   g   e   t       -   c       -   t       0       -   v    
    0000020   -   -   r   e   c   u   r   s   i   v   e       -   -   f   o
    0000040   r   c   e   -   d   i   r   e   c   t   o   r   i   e   s    
    0000060 342 200 220 342 200 220   r   e   c   u   r   s   i   v   e    
    0000100 342 200 220 342 200 220   n   o   -   c   l   o   b   b   e   r
    0000120     342 200 220 342 200 220   a   c   c   e   p   t       j   p
    0000140   g   ,   g   i   f   ,   p   n   g   ,   j   p   e   g   ,   m
    0000160   p   3   ,   M   P   3   ,   p   d   f         342 200 220 342
    0000200 200 220   d   i   r   e   c   t   o   r   y   -   p   r   e   f
    0000220   i   x   =   /   m   n   t   /   s   o   u   r   c   e   /   d
    0000240   o   w   n   l   o   a   d   s   /   l   e   c   t   u   r   e
    0000260   s   /               h   t   t   p   :   /   /   w   w   w   .
    0000300   p   r   a   v   a   c   h   a   n   a   m   .   c   o   m   /
    0000320   c   a   t   e   g   o   r   y   b   r   o   w   s   e   l   i
    0000340   s   t   /   2   0   q  \n
    0000347
    As you see there is several times the sequence (in octal)
    Code:
    342 200 220 342 200 220
    The equivalent in hex is
    Code:
    80e2 e290 9080
    and that is what you see in your error messages as
    Code:
    /%E2%80%90%E2%80%90
    They are not the
    Code:
    --
    that should be there.
    Last edited by hcvv; 02-Oct-2017 at 00:17.
    Henk van Velden

  8. #8
    Join Date
    Aug 2011
    Location
    India
    Posts
    278

    Default Re: using wget as an offline browser to download all mp3 files from a website.

    Please tell what to do now.
    Regards,
    Rupesh.

  9. #9
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    29,738

    Default Re: using wget as an offline browser to download all mp3 files from a website.

    Quote Originally Posted by rupeshforu3 View Post
    Please tell what to do now.
    Isn't that clear? You should type the statement fresh on your keyboard and not use copy/paste from some untrusted source.
    Henk van Velden

  10. #10
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    29,738

    Default Re: using wget as an offline browser to download all mp3 files from a website.

    Quote Originally Posted by pwilson View Post
    READ THE MAN PAGE AND PAY ATTENTION!!!!!

    Sorry, but really?!?!?!?! You're putting --recursive in twice, didn't specify a URL, and aren't bothering to actually try anything. You're just asking someone to tell you "type this in and press ENTER, and your problem is solved".
    Please tone down. The fact that --recursive is twice there does not do much harm. The fact that initially there was no URL, there is one now. The fact that the wrong type of - is used is the main cause of the failing of this statement. When you, after reading carefully what the other posts say about using U+2010 instead of U+002D, think those posters are wrong, then please post your arguments against that analysis.

    When you can not answer in a normal way, please don't. You are not obliged to answer. When you are frustrated, go somewhere else, take a walk, drink a beer, whatever.
    Henk van Velden

Page 1 of 5 123 ... LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •