Need script to move files and create subfolders.

I have an OWL Document management server running on OpenSuse that has been in production for several years.
During this time they have managed to scan in over 100,000 documents to tif and pdf and word files stored in a few folders.
Problem: Folder browsing performance sucks.
To increase performance, the logical thing is to create subfolders based on the file creation or modified date stamp, and move files to subfolders thus decreasing the individual files count, thus increasing browsing performance.
I need a script that will:
Create subfolders based on month, Year, and move files to those folders.
Any suggestions?
Or an App that will do this in a GUI would be great.
Any help is appreciated.rotfl!

I’d be surprised if there is a GUI-based automation tool for this kind of task but you never know :). Otherwise it looks like you’ve set yourself a nice project in running bash scripts revolving around the use of mkdir and date (or date +“%y.%m.%d”). I must confess I couldn’t do this entirely in Linux shell script (branch points I have scripted but not loops!), but I’m sure there are enough bash gurus around to help. Have you considered using a coding language that offers convenient string handling (e.g. Python)?

> Document management server running on OpenSuse that has
> been in production for several years.

-=Welcome=- new poster!

i wonder what version of openSUSE you are running, so therefore ask you
to please show us the terminal output from


cat /etc/SuSE-release

and i wonder whether or not it is connected to a network, internal or
external…


dd

Thanks for looking folks!
openSUSE 12.2 Mantis i586
Running on a Poweredge 2850 with about 400GB of Raid 5 and dual dual core Xeons at 3GHZ or so and 4GB of ECC.
This will be its only task.
And is currently connected to LAN and Internet.
Yes, I have seen some bash scripts, not that deep into shell programming.
But something like that.

On 11/28/2012 11:46 PM, mpaint wrote:
> openSUSE 12.2 Mantis i586

ok, but your “running on OpenSuse that has been in production for
several years.” had me expecting more like openSUSE 11.x or even 10.x


dd

The original server we are replacing is on SUSE enterprise 10.1 i believe…this is the new server. :wink:

On 2012-11-28 20:46, mpaint wrote:

> I need a script that will:
> Create subfolders based on month, Year, and move files to those
> folders.
> Any suggestions?

You have to determine a criteria for detecting which files to save
where. Perhaps date of the files themselves?

Also are all the files in the same folder, or distributed?

> Or an App that will do this in a GUI would be great.
> Any help is appreciated.rotfl!

It doesn’t look complicated so far, just busy coding.

You could use ‘find’ to generate a list of files between two dates, and
move or copy to a selected destination.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

?

#! /bin/bash

src_dir=/tmp/srcdir
dst_base=/tmp/destdir

cd $src_dir

for f in *; do
        dst_dir=${dst_base}/$(stat -c "%y" $f | awk '{ gsub(/-/,"",$1) ; print $1 }')
         -d  $dst_dir ] || mkdir -p $dst_dir
         -d  $dst_dir ] && mv $f $dst_dir
done

You’ll have to replace /tmp/srcdir and /tmp/destdir in this example.
Although it’s quite simple, I recommend you understand this code before using it.
There are many different ways to do this in bash.

On 11/29/2012 12:36 AM, mpaint wrote:
>
> The original server we are replacing is on SUSE enterprise 10.1 i
> believe…this is the new server. :wink:

do you understand that there is quite a difference between SUSE
Enterprise Linux and openSUSE??

for one thing: one is a LONG term (SLES 10 SP1 is still supported),
stable, dependable Enterprise class operating system while the current
offering of openSUSE (12.2) will not be supported past Jan 2014 (site:
https://en.opensuse.org/Lifetime) and continues to have
sufficient growing pains that i won’t use it on my daily driver…

so, think about changing operating system version in the next 12 months,
and every 12 to 18 months thereafter…

and, before turning openSUSE loose with that RAID setup, i would
strongly suggest you consider using an operating system which is
certified by Dell for use on that hardware! i do NOT believe you will
find any of Ubuntu, Fedora or openSUSE certified for that hardware…

on the other hand you might (probably will) find SUSE Linux Enterprise
Server 10 or 11 listed!

i’ll leave it to you to find the spec sheet delivered by Dell with that
hardware…but, i think you are on a dangerous (to your data) path.

the forums for SUSE Enterprise Linux are at http://forums.suse.com/ and
the ID/Pass you used here will work there also…additionally, you will
find many SLES Admins there…some of which probably have the kind of
script you are wanting (whereas here we are mostly desktop users helping
other desktop users)…

ymmv


dd

mpaint wrote:
> I have an OWL Document management server running on OpenSuse that has
> been in production for several years.
> During this time they have managed to scan in over 100,000 documents to
> tif and pdf and word files stored in a few folders.
> Problem: Folder browsing performance sucks.
> To increase performance, the logical thing is to create subfolders
> based on the file creation or modified date stamp, and move files to
> subfolders thus decreasing the individual files count, thus increasing
> browsing performance.

I disagree. Having folders with subfolders creates an annoying interface
for the users, IME & HO. The sucky performance is most likely down to
the filesystem you have chosen. Personally, I use Reiser because it
handles things like this well. But given the cloud hanging over it, you
might want to look at xfs or even btrfs (keep good backups!).

Alternatively, change the requirement. What app do they use for browsing
that gives them sucky performance? Why are they browsing the whole set
of files instead of searching to get a short list etc etc? e.g. is there
a role for a database here or even one of the dreaded filesystem
scanning bots.

On 2012-11-29 08:37, dd wrote:
> do you understand that there is quite a difference between SUSE
> Enterprise Linux and openSUSE??

Yes, but that is irrelevant :slight_smile:

It is just a matter of describing how to select the files that will go
in one directory (dates, names, random?) and then do some really simple
coding to do the choosing and moving. It can be done in any Linux in the
same manner.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

On 11/29/2012 11:58 AM, Carlos E. R. wrote:
> Yes, but that is irrelevant

correct, it is irrelevant to the question asked

but my post is relevant to the question not asked: have a server which
has run for years on SLES 10.1 which is being replaced by a new Dell
Poweredge 2850 (certified for SLES, Red Hat and a few others) and i have
loaded it with openSUSE to serve about 400GB of Raid 5 including 100,000
documents of tif, pdf and word files stored in a few folders.

how do i speed up the user’s file browsing performance on the new
Poweredge with openSUSE 12.2??


dd

Yes. I also wondered why the OP needed to browse directories … but didn’t ask. Otherwise I would have suggested to use a database or one of these file indexing programs.

On 2012-11-29 13:00, dd wrote:
> On 11/29/2012 11:58 AM, Carlos E. R. wrote:
>> Yes, but that is irrelevant
>
> correct, it is irrelevant to the question asked
>
> but my post is relevant to the question not asked: have a server which
> has run for years on SLES 10.1 which is being replaced by a new Dell
> Poweredge 2850 (certified for SLES, Red Hat and a few others) and i have
> loaded it with openSUSE to serve about 400GB of Raid 5 including 100,000
> documents of tif, pdf and word files stored in a few folders.

Money is scarce this days…

> how do i speed up the user’s file browsing performance on the new
> Poweredge with openSUSE 12.2??

The delay is probably because they are using graphical file browsers
that have to load the entire directory with thousands of files before
they display that directory. This is normal.

If instead you request the file by name directly without browsing, it
should be fast. And it is faster with a filesystem like reiserfs. It
would be terrible with FAT.

Typical :slight_smile:

If the retrieval is done by human using applications that browse
directories, the operation will be faster if the files are sorted in
smaller directories, provided the users know in which directory the
files will be stored, instead of having to search several folders.

On the other hand, it is possible to have both structures. One flat
directory with all files, and a hierarchical tree holding hardlinks to
the first. You only need a script that maintains the links periodically
or on request.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

Actually on a graph, reiserfs and ext3 have about the same performance. EXT2 without journaling is much faster at enumerating files in a folder. NTFS is actually much faster, but I don’t want to move to Windows, for stability, and resistance to malware issues.

The application used to access these files is the OWL Document Management System. This is based on Apache, PHP, and MYSQL. Has been around for a long time. More info on it here:
Owl Intranet Engine
Actually on a graph, reiserfs and ext3 have about the same performance in enumerating files in a folder. EXT2 without journaling is much faster at enumerating files in a folder. NTFS is actually much faster, but I don’t want to move to Windows, for stability, and resistance to malware issues.
The original and still production box dates from 2005, and the hardware is failing, we have had to replace capacitors on the motherboard 2X, and hard drives have failed requiring replacement and rebuilding the Software RAID1 2X, it is backed up, so no data loss has occurred. But time is money and this business relies on this Document Management System for routine storage and retrieval of documents.
Retrieval IS usually done using a search function, but occasionally the users do Browse the folders for maintenance etc.
We purchased a PE 2850 to have robust hardware and a 3 year warranty.
Previous box was software raid this is Perc hardware raid.
So I have not wasted so much time on this that I cannot go to SEL, but the customer has made it clear that they will NOT pay ongoing subscription fees. So regardless of OpenSUSE, or SEL, there will be no updates in a few months.
This is an internal not internet facing server, no public access to it.
I mean I can start over with SELS, and do have trial keys I can use. But in 90 days there will be no more updates.
At least with OpenSUSE, they have a couple of years, and there will be an upgrade path that will not require complete reinstallation, likely.
The OWL uses a “Look at hard drive” setting to refresh the data, so regardless of browsing in a GUI or using system calls to read the files in the folders and update the database, less files in a folder results in better performance. I have done a lot of research and it all points to the fact that NO file system will give sterling performance on enumerating files in folder with 100K files in one folder.
I appreciate all of your input.

On 2012-11-29 18:36, mpaint wrote:

> The original and still production box dates from 2005, and the
> hardware is failing, we have had to replace capacitors on the
> motherboard 2X, and hard drives have failed requiring replacement and
> rebuilding the Software RAID1 2X, it is backed up, so no data loss has
> occurred. But time is money and this business relies on this Document
> Management System for routine storage and retrieval of documents.
> Retrieval IS usually done using a search function, but occasionally the
> users do Browse the folders for maintenance etc.

You might need to have both types of structure, flat and tree. It is
possible with hardlinks, I have done that in a much smaller scale.

> We purchased a PE 2850 to have robust hardware and a 3 year warranty.
> Previous box was software raid this is Perc hardware raid.
> So I have not wasted so much time on this that I cannot go to SEL, but
> the customer has made it clear that they will NOT pay ongoing
> subscription fees. So regardless of OpenSUSE, or SEL, there will be no
> updates in a few months.

SEL? You mean SLES, I suppose.

> The OWL uses a “Look at hard drive” setting to refresh the data, so
> regardless of browsing in a GUI or using system calls to read the files
> in the folders and update the database, less files in a folder results
> in better performance. I have done a lot of research and it all points
> to the fact that NO file system will give sterling performance on
> enumerating files in folder with 100K files in one folder.

Reiserfs claimed that they do. But the version that the kernel includes
does not scale well.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

Hi,

Great code, but will it work if the filename contains a space?  Lenwolf

I think you know the answer… but you can fix it if you happen to use spaces in filenames. I don’t.

I do have spaces in some file names, but more importantly the script is running, and seeming to have the desired effect!
It does throw a lot of errors about “mv:cannot stat ‘filename’ No such file or directory”, assume because already moved file and is a loop.
Will let it run through on this folder and see what I get.
Thanks!rotfl!