Using rsync to clone http software repositories to a local folder

I’m still on the quest of switching from Windows to OpenSuse, hopefully during this year. I mostly solved my issues and differences with Win, except for a remaining concern. In Windows I keep installers for all programs I’m adding, so in case their website goes down or my internet connection fails I can still install my system and applications. Linux uses software repositories, and it’s hard to have the safety that if my internet goes down for a day (or some custom repository disappears temporarily or permanently) I won’t lose my software and the ability to install it again. Keeping the individual RPMs is hard due to dependencies, and because I have enough disk space I thought the best way to deal with this is by cloning software repositories to my drive, and keeping them up to date whenever possible.

The best way I found is rsync, which seems just perfect for what I want. But I can’t understand how to configure it so that it clones software repositories from HTTP. I looked a lot on google and asked on IRC (someone posted a wiki page where I didn’t find the answer). It explains how to clone from a rsync repository, how to use rsync to synchronize a local folder to another, but not how to download a http / ftp folder structure to a local folder. In the examples I seen, you use something like rsync.opensuse.org::OpenSuse where the :: doesn’t make sense since it’s not part of the URL, so it doesn’t work with normal paths. Basically, what I’m looking for is a command like this:

rsync -options --delete download.opensuse.org/distribution/12.1/repo/oss/ /home/mircea/repositories/opensuse-12.1-oss/

Does anyone know the proper command line? Yes, I know there is also a rsync.opensuse.org page, but I also want to use this for repositories that might not have a rsync server, so I just download the whole folder structure instead. Also I know the official repositories are large, but as long as they’re not over 100GB I can handle. This is important for me so I can use OpenSuse safely, so please let me know how it’s done.

Have a look at my bash script intended to do what you are asking about here:

S.L.R.C. - SuSE Local Repository Creator - Version 1.20 - Now for Packman & openSUSE 11.4 & 12.1 - Blogs - openSUSE Forums

Thank You,

Thanks, that’s also very useful. Since I’m still new and don’t feel like modding other user’s scripts for now (and rsync looks great for this too), I’d still like to know exactly what the rsync command is to clone a simple http or ftp folder structure. If I’m not comfy with that I might try other scripts for this purpose, but I’d like to try this first.

I have not tried it myself, but the first thing I see in your command is
that you do not use a proper URL


download.opensuse.org/distribution/12.1/repo/oss/

so you should at least access a http resource by using the http path.


http://download.opensuse.org/distribution/12.1/repo/oss/


PC: oS 12.1 x86_64 | i7-2600@3.40GHz | 16GB | KDE 4.8.3 | GeForce GT 420
ThinkPad E320: oS 12.1 x86_64 | i3@2.30GHz | 8GB | KDE 4.8.3 | HD 3000
eCAFE 800: oS 12.1 i586 | AMD Geode LX 800@500MHz | 512MB | KDE 3.5.10

Sorry, that’s because I tried it last night and rsync always said http:// is not recognized. So it gave me the impression it adds that on its own. But I already tried both with and without, and neither work.

Did you look into HTTrack? I have good experiences in using it to mirror (Parts of) web-sites to my system.
The web-site is here: HTTrack Website Copier - Free Software Offline Browser (GNU GPL)
And the opeenSUSE installables are here: software.opensuse.org:

Also wget might be an option for you (it is allready on your openSUSE):

man wget

I make my backups over the LAN using rsync/rsyncd, But it needs rsyncd running at the server side and I doubt that softewate opensuse.org* *runs such an rsynx server. It is an http server.

These are different protocols (and they use different ports by default, 80 resp.873).

Taking quickly a deeper look, isn’t it something like for example


rsync -rltp \
rsync://ftp5.gwdg.de/pub/opensuse/distribution/12.1/repo/oss \
~/repo/

that seems to work for me (of course choose your favorite mirror with
rsync support and a local path of your choice).


PC: oS 12.1 x86_64 | i7-2600@3.40GHz | 16GB | KDE 4.8.3 | GeForce GT 420
ThinkPad E320: oS 12.1 x86_64 | i3@2.30GHz | 8GB | KDE 4.8.3 | HD 3000
eCAFE 800: oS 12.1 i586 | AMD Geode LX 800@500MHz | 512MB | KDE 3.5.10

To add again to the point where your misunderstanding is:
rsync does NOT use the HTTP protocol, thus adding* http://* will not help (and as you saw it only makes the syntax of the parameter ununderstandable by rsync) and omitting it will not add it as a default like in a HTTP browser.

They seem to run an rsync deamon then, but I guess tthat they are about the only one of all the mirrors.
Having said that, GWDG is of course one of the most reliable mirrors of all.

And @MirceaKitsune, did you study

man rsync

There is at the very beginning:


Access via remote shell:
  Pull: rsync [OPTION...] [USER@]HOST:SRC... [DEST]
  Push: rsync [OPTION...] SRC... [USER@]HOST:DEST

Access via rsync daemon:
  Pull: rsync [OPTION...] [USER@]HOST::SRC... [DEST]
        rsync [OPTION...] rsync://[USER@]HOST:PORT]/SRC... [DEST]
  Push: rsync [OPTION...] SRC... [USER@]HOST::DEST
        rsync [OPTION...] SRC... rsync://[USER@]HOST:PORT]/DEST

It shows that for every syntax containing a remote system at least one *: *is somehere in the parameter. Thus simply using a hostname and a path added after a */ *will be interpreted as a path only (and I asume that the directory softeware.opensuse.org is not on your system).
And as said above the protocol definition http:// is not mentioned at all, but rsync:// is.

Am 05.06.2012 15:06, schrieb hcvv:
> They seem to run an -rsync- deamon then, but I guess tthat they are
> about the only one of all the mirrors.
> Having said that, GWDG is of course one of the most reliable mirrors of
> all.
>
>
I do not know how reliable it is, but I used this info
http://mirrors.opensuse.org/
there are some rsync mirrors listed for different regions.


PC: oS 12.1 x86_64 | i7-2600@3.40GHz | 16GB | KDE 4.8.3 | GeForce GT 420
ThinkPad E320: oS 12.1 x86_64 | i3@2.30GHz | 8GB | KDE 4.8.3 | HD 3000
eCAFE 800: oS 12.1 i586 | AMD Geode LX 800@500MHz | 512MB | KDE 3.5.10

On 2012-06-05 12:46, MirceaKitsune wrote:

> Does anyone know the proper command line? Yes, I know there is also a
> rsync.opensuse.org page, but I also want to use this for repositories
> that might not have a rsync server, so I just download the whole folder
> structure instead. Also I know the official repositories are large, but
> as long as they’re not over 100GB I can handle. This is important for me
> so I can use OpenSuse safely, so please let me know how it’s done.

To sync over http you use a normal http downloader, like wget or curl - not
rsync. To use rsync you connect to a machine that runs the rsync protocol,
which is why they told you to use “rsync.opensuse.org::OpenSuse” instead.
Left of the “::” is the address, and right is the resource. The “::” has a
special meaning re rsync that is in the manual.

Hint: there is a syntax with rsync that will list what resources are
available on an rsync server.
Hint: I don’t tell you which because then I would have to read the manual -
instead you read it for me :wink:

This will allow you to download only what you want.

Re your original problem, what I do is that I keep downloaded packages,
which is a per repo setting in zypper or yast.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

On 2012-06-05 15:27, Carlos E. R. wrote:
> This will allow you to download only what you want.

I forgot to mention. To reduce load on rsync servers, it is polite to
download first using wget, and when your mirror is populated then you
activate rsync to keep it.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

There seem to be much more rsync enabled mirrors then I expected.

Carlos E. R. wrote:
> I forgot to mention. To reduce load on rsync servers, it is polite to
> download first using wget, and when your mirror is populated then you
> activate rsync to keep it.

Really? I’ve never heard that. What is its source, please?

On 2012-06-06 12:24, Dave Howorth wrote:
> Carlos E. R. wrote:
>> I forgot to mention. To reduce load on rsync servers, it is polite to
>> download first using wget, and when your mirror is populated then you
>> activate rsync to keep it.
>
> Really? I’ve never heard that. What is its source, please?

Do you really need a source for that?
rsync is heavy on the server, thats why few offer that service.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Carlos E. R. wrote:
> On 2012-06-06 12:24, Dave Howorth wrote:
>> Carlos E. R. wrote:
>>> I forgot to mention. To reduce load on rsync servers, it is polite to
>>> download first using wget, and when your mirror is populated then you
>>> activate rsync to keep it.
>> Really? I’ve never heard that. What is its source, please?
>
> Do you really need a source for that?
> rsync is heavy on the server, thats why few offer that service.

Yes, please. I use rsync a lot and have never come across such an
instruction. And if I needed to protect against excessive load, why
wouldn’t I use rsync’s built-in --bwlimit=KBPS option, instead of going
to a competely separate program?

On 2012-06-06 12:41, Dave Howorth wrote:
> Carlos E. R. wrote:
>> On 2012-06-06 12:24, Dave Howorth wrote:

>> Do you really need a source for that?
>> rsync is heavy on the server, thats why few offer that service.
>
> Yes, please. I use rsync a lot and have never come across such an
> instruction. And if I needed to protect against excessive load, why
> wouldn’t I use rsync’s built-in --bwlimit=KBPS option, instead of going
> to a competely separate program?

I don’t have a source for that, I read it time ago on the rules of some
opensuse mirror.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Carlos E. R. wrote:
> On 2012-06-06 12:41, Dave Howorth wrote:
>> Carlos E. R. wrote:
>>> On 2012-06-06 12:24, Dave Howorth wrote:
>
>>> Do you really need a source for that?
>>> rsync is heavy on the server, thats why few offer that service.
>> Yes, please. I use rsync a lot and have never come across such an
>> instruction. And if I needed to protect against excessive load, why
>> wouldn’t I use rsync’s built-in --bwlimit=KBPS option, instead of going
>> to a competely separate program?
>
> I don’t have a source for that, I read it time ago on the rules of some
> opensuse mirror.

OK. Thanks.

On 2012-06-06 13:15, Dave Howorth wrote:
> Carlos E. R. wrote:

> OK. Thanks.

As I read that time ago, years, maybe they were more resource limited and
asked for that. I remember reading it, but I can’t remember where exactly.

The openSUSE server does have a limitation of 50 connections
here.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)