Results 1 to 7 of 7

Thread: wget mirror downloading excluded files

  1. #1
    Join Date
    Jan 2018
    Location
    Annandale, VA
    Posts
    143

    Default wget mirror downloading excluded files

    I'm trying to mirror a repository with the command.
    Code:
    wget --mirror --cut-dirs=0 -e robots=off --no-host-directories --no-parent --recursive -A rpm -X "*00archived*" -X "*debuginfo*","*debugsource*" -X "**/SRPMS/" -R src.rpm --directory-prefix="/run/media/root/DFS1JFS64/Vendors/Netlabs/Netlabsrepo" -N http://rpm.netlabs.org/release
    followed by a createrepo. I want to drop all of the 00archived and SRPMS subdirectories, but they are getting downloaded:
    Code:
    --2019-10-03 14:22:40--  http://rpm.netlabs.org/release
    Resolving rpm.netlabs.org (rpm.netlabs.org)... 213.238.45.91
    Connecting to rpm.netlabs.org (rpm.netlabs.org)|213.238.45.91|:80... connected.
    HTTP request sent, awaiting response... 301 Moved Permanently
    Location: http://rpm.netlabs.org/release/ [following]
    --2019-10-03 14:22:40--  http://rpm.netlabs.org/release/
    Reusing existing connection to rpm.netlabs.org:80.
    HTTP request sent, awaiting response... 200 OK
    Length: 330 [text/html]
    Saving to: ‘/run/media/root/DFS1JFS64/Vendors/Netlabs/Netlabsrepo/release.tmp’
    
         0K                                                       100% 9.59M=0s
    
    Last-modified header missing -- time-stamps turned off.
    2019-10-03 14:22:40 (9.59 MB/s) - ‘/run/media/root/DFS1JFS64/Vendors/Netlabs/Netlabsrepo/release.tmp’ saved [330/330]
    
    Removing /run/media/root/DFS1JFS64/Vendors/Netlabs/Netlabsrepo/release.tmp since it should be rejected.
    ...
    --2019-10-03 14:23:04--  http://rpm.netlabs.org/release/00Archived/SRPMS/libtool-2.4.2-3.oc00.i386.rpm
    Reusing existing connection to rpm.netlabs.org:80.
    HTTP request sent, awaiting response... 200 OK
    Length: 632828 (618K) [text/plain]
    Saving to: ‘/run/media/root/DFS1JFS64/Vendors/Netlabs/Netlabsrepo/release/00Archived/SRPMS/libtool-2.4.2-3.oc00.i386.rpm’
    
         0K .......... .......... .......... .......... ..........  8%  306K 2s
        50K .......... .......... .......... .......... .......... 16%  458K 1s
       100K .......... .......... .......... .......... .......... 24%  448K 1s
       150K .......... .......... .......... .......... .......... 32%  465K 1s
       200K .......... .......... .......... .......... .......... 40%  261K 1s
       250K .......... .......... .......... .......... .......... 48%  445K 1s
       300K .......... .......... .......... .......... .......... 56%  467K 1s
       350K .......... .......... .......... .......... .......... 64%  470K 1s
       400K .......... .......... .......... .......... .......... 72%  472K 0s
       450K .......... .......... .......... .......... .......... 80%  476K 0s
       500K .......... .......... .......... .......... .......... 88%  348K 0s
       550K .......... .......... .......... .......... .......... 97% 2.54M 0s
       600K .......... .......                                    100%  197K=1.5s
    
    2019-10-03 14:23:06 (419 KB/s) - ‘/run/media/root/DFS1JFS64/Vendors/Netlabs/Netlabsrepo/release/00Archived/SRPMS/libtool-2.4.2-3.oc00.i386.rpm’ saved [632828/632828]
    
    --2019-10-03 14:23:06--  http://rpm.netlabs.org/release/00Archived/SRPMS/libtool-ltdl-2.4.2-3.oc00.i386.rpm
    Reusing existing connection to rpm.netlabs.org:80.
    HTTP request sent, awaiting response... 200 OK
    Length: 29085 (28K) [text/plain]
    Saving to: ‘/run/media/root/DFS1JFS64/Vendors/Netlabs/Netlabsrepo/release/00Archived/SRPMS/libtool-ltdl-2.4.2-3.oc00.i386.rpm’
    
         0K .......... .......... ........                        100% 3.16M=0.009s
    
    2019-10-03 14:23:06 (3.16 MB/s) - ‘/run/media/root/DFS1JFS64/Vendors/Netlabs/Netlabsrepo/release/00Archived/SRPMS/libtool-ltdl-2.4.2-3.oc00.i386.rpm’ saved [29085/29085]
    What am I doing wrong, and what is the correct syntax to exclude tose directories and also the debug directories given on the command? Thanks.

  2. #2
    Join Date
    Sep 2012
    Posts
    5,230

    Default Re: wget mirror downloading excluded files

    Where in wget documentation it says that you can use patterns with -X option?

  3. #3
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    25,378

    Default Re: wget mirror downloading excluded files

    Quote Originally Posted by shmuelmetz View Post
    I'm trying to mirror a repository with the command.
    Code:
    ... -X "*00archived*"
    followed by a createrepo. I want to drop all of the 00archived and SRPMS subdirectories, but they are getting downloaded:
    Code:
    ....
    --2019-10-03 14:23:06--  http://rpm.netlabs.org/release/00Archived/SRPMS/libtool-ltdl-2.4.2-3.oc00.i386.rpm
    Reusing existing connection to rpm.netlabs.org:80.
    HTTP request sent, awaiting response... 200 OK
    Length: 29085 (28K) [text/plain]
    Saving to: ‘/run/media/root/DFS1JFS64/Vendors/Netlabs/Netlabsrepo/release/00Archived/SRPMS/libtool-ltdl-2.4.2-3.oc00.i386.rpm’
    I hope I understand correct that this is what you are hinting at. But when it is the above, then I hope you now see the difference between an a and an A.
    Henk van Velden

  4. #4
    Join Date
    Jan 2018
    Location
    Annandale, VA
    Posts
    143

    Default Re: wget mirror downloading excluded files

    Quote Originally Posted by arvidjaar View Post
    Where in wget documentation it says that you can use patterns with -X option?
    "
    Specify a comma-separated list of directories you wish to exclude fromdownload (see Directory-Based Limits). Elements of list may contain wildcards."

  5. #5
    Join Date
    Jan 2018
    Location
    Annandale, VA
    Posts
    143

    Default Re: wget mirror downloading excluded files

    Quote Originally Posted by hcvv View Post
    I hope I understand correct that this is what you are hinting at. But when it is the above, then I hope you now see the difference between an a and an A.
    Thanks; that's part of the problem. But I still don't understand why the SRPMS directory is not skipped.

  6. #6
    Join Date
    Jun 2008
    Location
    San Diego, Ca, USA
    Posts
    11,461
    Blog Entries
    2

    Default Re: wget mirror downloading excluded files

    Quote Originally Posted by shmuelmetz View Post
    Thanks; that's part of the problem. But I still don't understand why the SRPMS directory is not skipped.
    Am guessing, maybe a missing leading forward slash?
    I don't usually double glob the wildcard, since it is supposed to represent any number of Directory levels, am thinking that should imply needing the leading slash to denote a directory type... or I'm babbling...
    Code:
     -X "/**/SRPMS/"
    TSU
    Beginner Wiki Quickstart - https://en.opensuse.org/User:Tsu2/Quickstart_Wiki
    Solved a problem recently? Create a wiki page for future personal reference!
    Learn something new?
    Attended a computing event?
    Post and Share!

  7. #7
    Join Date
    Sep 2012
    Posts
    5,230

    Default Re: wget mirror downloading excluded files

    Quote Originally Posted by shmuelmetz View Post
    But I still don't understand why the SRPMS directory is not skipped.
    wget compares full names starting from server root and is using shell globbing (where wildcard characters do not match directory separator). So *SRPMS* does not match /release/00Archived/SRPMS just as it would not match in normal shell.

    Use --reject-regex ".*/SRPMS/.*" if you want to skip SRPMS everywhere.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •