Hi I am Rupesh from India and I have downloaded huge mp3 files from internet and I want to create a database of these files.
Recently I have seen a website which contains some important lectures and decided to download the site. Actually the website consists of up to 60000 mp3 files and in that I need 13000 files and all the files I need are contained in a separate directory I mean with in a web page and it’s sub web pages. I have downloaded what I need ie., 11000 mp3 files of size 185 gb using offline browser called extreme picture finder and remaining 2000 files are not downloaded. The extreme picture finder downloaded these files fastly and accurately but for remaining 2000 files it is taking lot of time and so I have stopped the program. I want to download remaining 2000 files manually I mean not through any offline browser.
The website from which I have downloaded mp3 file’s is very nicely organized. The website administrators arranged webpages nicely I mean suppose if we examine a webpage they have specified that all the sub pages under that webpage consists of say 50 directories and 1000 files etc.,. I want to create a database file similar to the above.
The extreme picture finder application in Windows downloaded the files to the folder e:/downloads/ with exact file system structure as source website. Is there any possibility to create a database of all the folders and files which are under this folder. I think that we can create a ms access database so that I can copy the resulted access database file called copied.mdb to my tab and open the file in android office applications like kingsoft etc.,.
Actually what I am expecting is suppose I open the database file in android application like kingsoft and examine the properties of a folder and suppose it shows that it contains 100 file’s and then I will open the website address related to that directory in androids Firefox and if that web address shows that it contains 160 files then I will try to download remaining 60 file’s. With 20 to 30 such attempts I can download all the remaining 2000 file’s.
For Windows and Linux there are programs such as mediainfo,mediaconch which are somewhat useful for this particular need but in Windows I have found an application called playtime which will create an excel file of all the mp3 file’s and their properties.
In Linux there’s a command line tool called ls (list) with option -R (recursive). Is it possible to create a database using such command line utilities.
Please try to suggest a way to create a database like ms access or atleast any other sql database.
A lot depends on the naming convention they have used; if they have used underscore (_) in all the filenames rather than space:
ls -l > list.txt
will create a list of all the files in a folder; this will have some multiple spaces in it; reduce these by using Search for two spaces and replace with one until you only have single spaces in the file. Then Search for space and replace with comma (,) and you have a CSV file which you can import into any database program.
You can list them all with:
ls -lR > list.txt
which will give you a recursive list of the files in each folder; however, unless you decide to use this as a quick way to get them all and then deal with each folder separately, you will have to decide how you want to distinguish the folders in this long file.
The files which I have downloaded from the website are not copyrighted and they are distributing those files freely. In the website they have clearly mentioned that “if anyone finds any file which is copyrighted then immediately report then we will remove them”.
Can you suggest how to create a database from the output which is coming from ls -lr or atleast an excel file.
Rupesh,
It would help both yourself and those wishing to help if you would explain what it is that you are trying to achieve and why.
In the last few months you have asked for, and been given, advise on using “mediainfo” and a java csv file comparison tool. You are now asking about a “database”. That normally implies a database manager (e.g. MariaDb, Berkeleydb) to operate on more than one type of data (e.g. correlating audio files with keywords, artists, etc.). You have given no indication of either.
At its simplest a text file “database” of your mp3 filenames could be created with:
> ls -r "path totop level mp3 directory" > LectureFilenames.text
but I do not expect this would be very useful.
If you just want to have a local copy of these lectures for off-line private study, and they are well laid out on the source web-site, why not consider making a local mirror of (part of) the original site using “wget” or, if the site is designed for it, “rsync”?
Generally speaking,
Nowadays people expect more than just storing media content in a database.
They want the files searchable by metadata special to the type of media, like author, time, titles, lyrics, album, more.
They often want to be able to create custom playlists.
They often want to create custom categories, ie “tagging”
To do the above,
You need to look **first **at User apps that support the features you want, not how the content is stored.
And, consider how your content will be accessed. Are you interested only in a locally running app? Or, perhaps a Home Entertainment system might be better as a centralized repository for all your devices?
As for storing your files in a database,
You’d have to state your reasons for doing so, and perhaps more specifically objectives.
For instance,
Less storage used
Faster search
Security
But, of course you should consider how you want to retrieve content from your database, if an app isn’t already available can a very generic interface be sufficient? Do you have to skill or initiative to learn how to write the code?
I love to listen to music and also love to listen lectures given by someone. I am very serious about how many music files does I have and how many lectures dies I have etc., If I have downloaded some music or lectures from internet I have the habit of checking how many files I have downloaded and are the files downloaded properly or downloaded only half in size etc., . Previously I have downloaded some music files from internet and I have compared my downloaded files and the files on source website and they are different in size.
The files from website I am trying to download are important to me and so I am struggling a lot to download all the files I want but surprisingly the total size of website is 1.5 tb approximately and in those I want to download upto 230 GB. It is not possible to download 230 GB of files within one day and even offline browser’s are also taking lot of time.
If I can obtain a way to check how many folders and files are present in the root directory of my downloaded directory it is possible to download all the files I want very easily. At present I have Android phone but it doesn’t have 250 GB of memory and so I think if I can obtain database it will be easier to compare local files and the source website.
At present I got an idea if we can obtain list of files using ls -r command and store the result in an text file then we can use touch to create files with directory structure same. I think that touch command will just create file with zero file size.
By using ls and touch commands I can recreate total files with directory structure and I can zip the resulting folder and then copy 1 mb zip to Android phone and so I can examine which files are missing. Please give some suggestions on my idea.
Use something like HTTrack or even the basic wget command to download your files.
As you might imagine wget is about as fast as you can get with minimal overhead, but unlike HTTrack has limited download options.
IIRC HTTrack website copying can be interrupted and resumed at a later time.
Once downloaded,
You can use an app like Calibre for organization and access.
Is it possible to sync local directory with the folder present in website server I mean suppose local directory has 50 sub directories and 1500 files and the folder present in website server has 150 sub directories and 8000 files then is there any method to get 100 remaining directories and 6500 files.
I have not used wget previously does it function like offline browser. In offline browser if we provide website address and some options like spidering etc., then it will download files accordingly. Is there any gui for wget and if exists which is the best one.
Httrack is very difficult to understand and so can you suggest other tool.
I have asked in my previous post that the combination of ls -r and touch may be suitable for my current need or not but you have not answered.
“Sync” has a special meaning, including detecting changes between the local and remote copy, and doing whatever is necessary to ensure both have the same content. It’s not exactly what you really want.
You really want to replicate one thing (the remote files you want) so that you have an exact copy. Once it’s done, you probably don’t want to continue to compare your local and remote copies.
IIRC Httrack and similar tools will do what you ask, determine the difference between what you have locally and what is available on the remote website… Then download the files you don’t have.
wget is not any kind of browser. wget is literally “web get” which is to go get specified files and save a copy locally. A simple and powerful tool, which can be configured to specify complex collections of files to your machine. There might be a UI for wget but I’ve never looked for one because this is one of those tools where the raw power of the command line doesn’t translate so well into a GUI. Better instead to find an online tutorial or guide to build the specific command which does the task you need. There is a strong likelihood that many if not most or all web scraping tools use wget below the surface (more or web scraping below)
I personally haven’t used any other GUI scraping tool but it looks like dozens of new tools have been created within the past few years. Maybe they’re more intuitive to you. Just Google “website scraping software free”
As you described touch only creates an empty file. You want to copy the file from a remote location so you have the full file with its content.
May I know the difference between Web scraping, offline browsing and others. I am hearing the word web scraping for the first time and so can you suggest a link describing it. Upon searching web I found that web scraping plugins are available for Firefox browser does this plugin downloads all the files from website.
By using the combination of ls -r and touch what I am expecting is to generate a list of download links which I have not downloaded previously and store the result in a text file and then later I can download the files using download managers like wget or others.
scraping usually means grabbing small amounts of information like a web page or part of one at the moment you can try this Firefox addon (it will work until Firefox 57 when mozilla will kill old style addons) https://addons.mozilla.org/en-US/firefox/addon/scrapbook-x/
the other scrapbook addons seam to have been abandoned
offline browsing usually means downloading whole web sites and a lot of information most of which is unneeded.
about your issue I think ffmpeg (or mediinfo) and a shell script would do it, but I don’t know that much scripting you can try the mediainfo gui (I do know it can export data in csv I’m just not sure if it can simultaneously open multiple files)
just checked and mediainfo-gui can open a folder of media files and show detailed information about them, by default duration isn’t shown but can be customized in options->preference
to show all opened media files you need to chose view->sheat
then you can do file->export and you can generate a csv file that can be opened and edited in libreoffice calc and saved in xls or ods format
Thanks a lot for suggesting the tool wget for my need because all of you from the past one month are suggesting the same tool but I have not noticed the important of this tool. Just now I read the wikipedia page related to wget and found some of it’s features like website crawling, resuming the failed download etc., . By using this tool I can accomplish my job as soon as possible.
From the year 2003 I have used a number of offline browser’s in Windows but not found any in Linux. The previously used offline browser’s doesn’t have the resume capability so I must check the internet connection because if there is no connection then the whole project needs to be restarted. In order to download a simple website I have spent month’s of time.
I have afraid the same thing will repeat and so I thought to create a database of local files, database of remote website etc.,.
This time before trying wget I am going to read wget manual and then try it. Can you suggest any gui for the tool which has all the options specified in manual. Can you suggest a page which describes it’s features , guide etc.,.
In the Windows forums if I ask a simple question they may never reply and even if they reply the answer will not be perfect but in Linux world the thing is different.By asking the same question again and again I found that wget which is open source is best for my need.
Nowadays, I’d Google the command and your list of desirable features and hope that someone has blogged providing the command to do exactly what you want.
I just did a search on my personal machine and found the specific options I found most useful for what I was doing