How to count number of pages of all djvu files containing in a directory and store in text file.

rupeshforu3 · December 7, 2017, 2:48am

Hi I am Rupesh from India and I have 2400 PDF files which are converted to djvu files using pdf2djvu utility. All the files are converted successfully without any warnings but I want to count whether all original pdf files and the converted djvu files have same number of pages or not. I just want to compare page count only and discard everything like quality etc.,.

I have converted PDF files to djvu files in order to save space and I am going to keep both types of files and use djvu files but if I found any error I will read the corresponding PDF file.

I have obtained page count of all pdf files using pdfinfo utility and stored in a text file. If I can obtain page count of all djvu files and store the output in text file I can compare two text files using diff utility.

The command used to count number of pages regarding djvu is djvused and I am able to get page count using this command.

Suppose when I issue the command

djvused -e n filename.djvu I am getting output with single line which contains
150.

I have tried to save the output to a text file of the above command through redirection and succeded like

djvused -e n filename.djvu > output.txt

I want to obtain number of pages of every file and store the output in a text file in the following pattern

Filename.djvu page_count.

In order to achieve the above I have created a simple shell script which I am showing below


cd /run/media/root/Others/temp/converted\ djvu/ttd/Home/Download/
for f in *.djvu
do
    echo "file name:  $f"
    djvused -e n $f > output.txt
done

On executing the above script I am able to get filenames of all the files but I am getting page count for only few files that too with errors.some of the output I am showing below


DJVUSED --- DjVuLibre-3.5.25
Simple DjVu file manipulation program

Usage: djvused [options] djvufile
Executes scripting commands on djvufile.
Script command come either from a script file (option -f),
from the command line (option -e), or from stdin (default).

Options are
  -v               -- verbose
  -f <scriptfile>  -- take commands from a file
  -e <script>      -- take commands from the command line
  -s               -- save after execution
  -u               -- produces utf8 instead of escaping non ascii chars
  -n               -- do not save anything


Commands
--------
The following commands can be separated by newlines or semicolons.
Comment lines start with '#'.  Commands usually operate on pages and files
specified by the "select" command.  All pages and files are initially selected.
A single page must be selected before executing commands marked with a period.
Commands marked with an underline do not use the selection

   ls                     -- list all pages/files
   n                      -- list pages count
   dump                   -- shows IFF structure
   size                   -- prints page width and height in html friendly way
   select                 -- selects the entire document
   select <id>            -- selects a single page/file by name or page number
   select-shared-ant      -- selects the shared annotations file
   create-shared-ant      -- creates and select the shared annotations file
   showsel                -- displays currently selected pages/files
 . print-ant              -- prints annotations
 . print-merged-ant       -- prints annotations including the shared annotations
 . print-meta             -- prints file metadatas (a subset of the annotations
   print-txt              -- prints hidden text using a lisp syntax
   print-pure-txt         -- print hidden text without coordinates
 _ print-outline          -- print outline (bookmarks)
 . print-xmp              -- print xmp annotations
   output-ant             -- dumps ant as a valid cmdfile
   output-txt             -- dumps text as a valid cmdfile
   output-all             -- dumps ant and text as a valid cmdfile
 . set-ant <antfile>]    -- copies <antfile> into the annotation chunk
 . set-meta <metafile>]  -- copies <metafile> into the metadata annotation tag
 . set-txt <txtfile>]    -- copies <txtfile> into the hidden text chunk
 . set-xmp <xmpfile>]    -- copies <xmpfile> into the xmp metadata annotation tag
 _ set-outline <bmfile>] -- sets outline (bootmarks)
 _ set-thumbnails <sz>]  -- generates all thumbnails with given size
   remove-ant             -- removes annotations
   remove-meta            -- removes metadatas without changing other annotations
   remove-txt             -- removes hidden text
 _ remove-outline         -- removes outline (bookmarks)
 . remove-xmp             -- removes xmp metadata from annotation chunk
 _ remove-thumbnails      -- removes all thumbnails
 . set-page-title <title> -- sets an alternate page title
 . save-page <name>       -- saves selected page/file as is
 . save-page-with <name>  -- saves selected page/file, inserting all included files
 _ save-bundled <name>    -- saves as bundled document under fname
 _ save-indirect <name>   -- saves as indirect document under fname
 _ save                   -- saves in-place
 _ help                   -- prints this message

Interactive example:
--------------------
  Type
    % djvused -v file.djvu
  and play with the commands above

Command line example:
---------------------
  Save all text and annotation chunks as a djvused script with
    % djvused file.djvu -e output-all > file.dsed
  Then edit the script with any text editor.
  Finally restore the modified text and annotation chunks with
    % djvused file.djvu -f file.dsed -s
  You may use option -v to see more messages

I think that it is showing manual page because something went wrong. may I know whats wrong have been done.

I have searched gui for djvused and found djvusmooth and installed it but unable to run it I am providing the errors below


linux-tg2q:~ # djvusmooth
Traceback (most recent call last):
  File "/usr/bin/djvusmooth", line 20, in <module>
    from djvusmooth.gui.main import Application
  File "/usr/lib/python2.7/site-packages/djvusmooth/gui/main.py", line 28, in <module>
    import djvusmooth.dependencies as __dependencies
  File "/usr/lib/python2.7/site-packages/djvusmooth/dependencies.py", line 70, in <module>
    _check_djvu()
  File "/usr/lib/python2.7/site-packages/djvusmooth/dependencies.py", line 45, in _check_djvu
    python_djvu_decode_version, ddjvu_api_version = djvu_decode_version.split('/')
ValueError: need more than 1 value to unpack
linux-tg2q:~ #

If djvusmooth works properly is it possible to create a text file which contains filename and its page count.

Please suggest how to save the page count information including file name of all the djvu files and store the information in text file.

I_A · December 7, 2017, 9:00am

why would you convert pdf files to djvu
djvu is used for scanned documents it is an image format not a document format
that being said your first script has a strange cd path and is missing ; at the end of each command
try

cd /run/media/root/Others/temp/converted\ djvu/ttd/Home/Download/
for f in *.djvu
do
    echo "file name:  $f";
    djvused -e n $f > output.txt;
done

your second command looks like a developer bug or missing library with djvusmooth ask at the developers page
https://github.com/jwilk/djvusmooth/issues

rupeshforu3 · December 7, 2017, 9:32am

2000 PDF files which I have converted consists of only scanned images without any text and so I have converted. I have converted whether to see djvu files are at acceptable quality or not.

When I executed the above script it has given page numbers of some files but for some files it gave errors.

rupeshforu3 · December 7, 2017, 10:24am

As you suggested I copied the directory to another place so that it may not look strange and I placed ; at the end of commands and the script I modified is as below


cd /run/media/root/Others/Download
for f in *.djvu
do
    echo "file name:  $f";
    djvused -e n $f > output.txt;
done

I have executed the above script but even no use I mean it is displaying manual page for maximum files.when i use djvused for single file it is working properly but when I use in scripts it is not working.

for your reference I am providing part of output of the ls command executed on the directory I want to count as below


linux-tg2q:/run/media/root/Others/Download # pwd
/run/media/root/Others/Download

108 Vaishnavite Divya Desams Vol 1.djvu
108 Vaishnavite Divya Desams Vol 2.djvu
108 Vaishnavite Divya Desams Vol 3.djvu
108 Vaishnavite Divya Desams Vol 5.djvu
108 Vaishnavite Divya Desams Vol 6.djvu
108 Vaishnavite Divya Desams Vol 7.djvu
A Day A Divine Thought.djvu
A Glossary Of Philosophical Terms.djvu
A History Of Tirupati Vol I Second Edition.djvu
A M V Narasimhacahryula Chaturdi.djvu
A Monograph On Sri Tyagaraja Swamys Ghana Raga Pancharatna Keertanas.djvu
A Study In Spirituality And Human Development Resource.djvu
A Study Of The Compositions Of Purandaradasa And Tyagaraja.djvu
A Synopsis Of Srimath Bhagavatam Vol I Skandas I to VIII.djvu

I am suspecting that djvused is not detecting djvu files because filenames consists of spaces. Is it true.

Regarding djvusmooth is there any necessity to update python to higher version because when I examine errors I am able to see lines containing *unable to load .py. *

knurpht · December 7, 2017, 12:48pm

Why don’t you get the info from the pdf’s ?

pdfinfo FILENAME.pdf | grep -i pages

rupeshforu3 · December 7, 2017, 5:55pm

I am glad to say that I have succeeded and I am going to explain the process.

As I have guessed pdfinfo and djvused doesn’t work with filenames containing spaces and so I have replaced spaces with character ‘_’ of all pdf files and djvu files. After that I have used two scripts each for pdfs and djvus and those are


cd /run/media/root/Others/temp/djvus2
for f in *.djvu
do
    echo "file name:  $f";
    djvused -e n $f;
done


cd /run/media/root/Source/temp/pdfs2

for f in *.pdf
do
    echo "file name:  $f";
    pdfinfo $f; 
done

I run the above scripts with the following pattern



./count_pdf.sh > pdf_output.txt 2> pdf_errors.txt
./count_djvu.sh > djvu_output.txt 2>> djvu_errors

Upon executing the above scripts two huge text files named pdf_output.txt and djvu_output.txt are generated. After that I have used grep to extract filenames and page count of the two files and after that I compared them using diff tool. Upto 95 percent of pdf files are converted successfully.

pdfinfo gave four errors randomly for upto 150 files and I am providing them below


Syntax Error: Expected the optional content group list, but wasn't able to find it, or it isn't an Array
Syntax Error: Marked object is wrong type (boolean)
Syntax Warning: Invalid least number of objects reading page offset hints table
Syntax Warning: Invalid number of shared object groups

djvused gave no errors at all.

May I know the meaning of the pdfinfo errors.

Thanks to knurpht and I_A for giving valuable suggestions.

I_A · December 8, 2017, 4:20am

the pdfinfo errors are related to bad pdf files usually during pdf generation you could fix them but as you have thousands of files that’s too much work if the files render using a pdf viewer leave them alone the internal errors can usually be ignored

rupeshforu3 · December 8, 2017, 7:23am

Thanks for your suggestion and I am going to neglect those errors.

jetchisel · December 8, 2017, 1:01pm

rupesh reddy:

I am glad to say that I have succeeded and I am going to explain the process.

As I have guessed pdfinfo and djvused doesn’t work with filenames containing spaces and so I have replaced spaces with character ‘_’ of all pdf files and djvu files. After that I have used two scripts each for pdfs and djvus and those are
cd /run/media/root/Others/temp/djvus2
for f in *.djvu
do
    echo "file name:  $f";
    djvused -e n $f;
done
cd /run/media/root/Source/temp/pdfs2

for f in *.pdf
do
    echo "file name:  $f";
    pdfinfo $f; 
done
I run the above scripts with the following pattern
./count_pdf.sh > pdf_output.txt 2> pdf_errors.txt
./count_djvu.sh > djvu_output.txt 2>> djvu_errors
Upon executing the above scripts two huge text files named pdf_output.txt and djvu_output.txt are generated. After that I have used grep to extract filenames and page count of the two files and after that I compared them using diff tool. Upto 95 percent of pdf files are converted successfully.

pdfinfo gave four errors randomly for upto 150 files and I am providing them below
Syntax Error: Expected the optional content group list, but wasn't able to find it, or it isn't an Array
Syntax Error: Marked object is wrong type (boolean)
Syntax Warning: Invalid least number of objects reading page offset hints table
Syntax Warning: Invalid number of shared object groups
djvused gave no errors at all.

May I know the meaning of the pdfinfo errors.

Thanks to knurpht and I_A for giving valuable suggestions.

Hi,

Because you did not quote the variable $f, always use “$f” (inside double quotes) to prevent word splitting and to avoid the expansion of special characters that the shell might interpret.


cd /run/media/root/Others/temp/djvus2
for f in *.djvu
do
    echo "file name:  $f";
    djvused -e n "$f"
done


cd /run/media/root/Source/temp/pdfs2

for f in *.pdf
do
    echo "file name:  $f"
    pdfinfo "$f" 
done

rupeshforu3 · December 9, 2017, 2:39pm

jetchisel:

Hi,

Because you did not quote the variable $f, always use “$f” (inside double quotes) to prevent word splitting and to avoid the expansion of special characters that the shell might interpret.
cd /run/media/root/Others/temp/djvus2
for f in *.djvu
do
    echo "file name:  $f";
    djvused -e n "$f"
done
cd /run/media/root/Source/temp/pdfs2

for f in *.pdf
do
    echo "file name:  $f"
    pdfinfo "$f" 
done

Does your idea pertains to all cases in using scripting and terminal. I have a text file which consists of file names with spaces and special charecters like % and I want to create files in the current directory. Is it possible to create files without any modifications I mean if there are 1000 lines in text file I want to create files same as in a text file.

Previously I have placed ***touch " ***at beginning and again " at last. The lines are in the pattern as below
touch “file name.mp3” and after that I have copied all lines to a new shell script and named it as create_directories.sh. when I executed that script I got a number of errors.

I am requesting you to try as below and let me know how to create files with white spaces.


touch "one two three"

jetchisel · December 9, 2017, 4:20pm

rupesh reddy:

Does your idea pertains to all cases in using scripting and terminal. I have a text file which consists of file names with spaces and special charecters like % and I want to create files in the current directory. Is it possible to create files without any modifications I mean if there are 1000 lines in text file I want to create files same as in a text file.

Previously I have placed ***touch " ***at beginning and again " at last. The lines are in the pattern as below
touch “file name.mp3” and after that I have copied all lines to a new shell script and named it as create_directories.sh. when I executed that script I got a number of errors.

I am requesting you to try as below and let me know how to create files with white spaces.
touch "one two three"

Hi,

So the filenames are inside a text file, assuming it is one line per name.

So here is an example:

First create a test directory and go inside it once it is created.

mkdir test && cd "$_"

Now create a TextFile, it is just an arbitrary name of a file, and populate it with some lines using a for loop.

for ((i=0;i<=9;i++)); do
  printf '%s
' "Line:$i" >> TextFIle
done

Now check what is inside the TextFile.

cat TextFile


Line:0
Line:1
Line:2
Line:3
Line:4
Line:5
Line:6
Line:7
Line:8
Line:9

Now finally to create the files based on the lines of TextFile.

while IFS= read -r line; do
  printf '%s
' "$line" > "$line"
done

ls


Line:0  Line:1  Line:2  Line:3  Line:4  Line:5  Line:6  Line:7  Line:8  Line:9  TextFIle

The one you should be focusing is the while read loop since it is one way to create a files based on the content of TextFile.

jetchisel · December 9, 2017, 6:28pm

Hi,

The original question was ‘howto create filename with white spaces’.

Well yes just what like you did, use quotes around the filenames.