Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: How to count number of pages of all djvu files containing in a directory and store in text file.

  1. #1
    Join Date
    Aug 2011
    Location
    India
    Posts
    205

    Default How to count number of pages of all djvu files containing in a directory and store in text file.

    Hi I am Rupesh from India and I have 2400 PDF files which are converted to djvu files using pdf2djvu utility. All the files are converted successfully without any warnings but I want to count whether all original pdf files and the converted djvu files have same number of pages or not. I just want to compare page count only and discard everything like quality etc.,.

    I have converted PDF files to djvu files in order to save space and I am going to keep both types of files and use djvu files but if I found any error I will read the corresponding PDF file.

    I have obtained page count of all pdf files using pdfinfo utility and stored in a text file. If I can obtain page count of all djvu files and store the output in text file I can compare two text files using diff utility.

    The command used to count number of pages regarding djvu is djvused and I am able to get page count using this command.

    Suppose when I issue the command

    djvused -e n filename.djvu I am getting output with single line which contains
    150.

    I have tried to save the output to a text file of the above command through redirection and succeded like

    djvused -e n filename.djvu > output.txt

    I want to obtain number of pages of every file and store the output in a text file in the following pattern

    Filename.djvu page_count.

    In order to achieve the above I have created a simple shell script which I am showing below

    Code:
    cd /run/media/root/Others/temp/converted\ djvu/ttd/Home/Download/
    for f in *.djvu
    do
        echo "file name:  $f"
        djvused -e n $f > output.txt
    done
    On executing the above script I am able to get filenames of all the files but I am getting page count for only few files that too with errors.some of the output I am showing below

    Code:
    DJVUSED --- DjVuLibre-3.5.25
    Simple DjVu file manipulation program
    
    Usage: djvused [options] djvufile
    Executes scripting commands on djvufile.
    Script command come either from a script file (option -f),
    from the command line (option -e), or from stdin (default).
    
    Options are
      -v               -- verbose
      -f <scriptfile>  -- take commands from a file
      -e <script>      -- take commands from the command line
      -s               -- save after execution
      -u               -- produces utf8 instead of escaping non ascii chars
      -n               -- do not save anything
    
    
    Commands
    --------
    The following commands can be separated by newlines or semicolons.
    Comment lines start with '#'.  Commands usually operate on pages and files
    specified by the "select" command.  All pages and files are initially selected.
    A single page must be selected before executing commands marked with a period.
    Commands marked with an underline do not use the selection
    
       ls                     -- list all pages/files
       n                      -- list pages count
       dump                   -- shows IFF structure
       size                   -- prints page width and height in html friendly way
       select                 -- selects the entire document
       select <id>            -- selects a single page/file by name or page number
       select-shared-ant      -- selects the shared annotations file
       create-shared-ant      -- creates and select the shared annotations file
       showsel                -- displays currently selected pages/files
     . print-ant              -- prints annotations
     . print-merged-ant       -- prints annotations including the shared annotations
     . print-meta             -- prints file metadatas (a subset of the annotations
       print-txt              -- prints hidden text using a lisp syntax
       print-pure-txt         -- print hidden text without coordinates
     _ print-outline          -- print outline (bookmarks)
     . print-xmp              -- print xmp annotations
       output-ant             -- dumps ant as a valid cmdfile
       output-txt             -- dumps text as a valid cmdfile
       output-all             -- dumps ant and text as a valid cmdfile
     . set-ant [<antfile>]    -- copies <antfile> into the annotation chunk
     . set-meta [<metafile>]  -- copies <metafile> into the metadata annotation tag
     . set-txt [<txtfile>]    -- copies <txtfile> into the hidden text chunk
     . set-xmp [<xmpfile>]    -- copies <xmpfile> into the xmp metadata annotation tag
     _ set-outline [<bmfile>] -- sets outline (bootmarks)
     _ set-thumbnails [<sz>]  -- generates all thumbnails with given size
       remove-ant             -- removes annotations
       remove-meta            -- removes metadatas without changing other annotations
       remove-txt             -- removes hidden text
     _ remove-outline         -- removes outline (bookmarks)
     . remove-xmp             -- removes xmp metadata from annotation chunk
     _ remove-thumbnails      -- removes all thumbnails
     . set-page-title <title> -- sets an alternate page title
     . save-page <name>       -- saves selected page/file as is
     . save-page-with <name>  -- saves selected page/file, inserting all included files
     _ save-bundled <name>    -- saves as bundled document under fname
     _ save-indirect <name>   -- saves as indirect document under fname
     _ save                   -- saves in-place
     _ help                   -- prints this message
    
    Interactive example:
    --------------------
      Type
        % djvused -v file.djvu
      and play with the commands above
    
    Command line example:
    ---------------------
      Save all text and annotation chunks as a djvused script with
        % djvused file.djvu -e output-all > file.dsed
      Then edit the script with any text editor.
      Finally restore the modified text and annotation chunks with
        % djvused file.djvu -f file.dsed -s
      You may use option -v to see more messages
    I think that it is showing manual page because something went wrong. may I know whats wrong have been done.

    I have searched gui for djvused and found djvusmooth and installed it but unable to run it I am providing the errors below

    Code:
    linux-tg2q:~ # djvusmooth
    Traceback (most recent call last):
      File "/usr/bin/djvusmooth", line 20, in <module>
        from djvusmooth.gui.main import Application
      File "/usr/lib/python2.7/site-packages/djvusmooth/gui/main.py", line 28, in <module>
        import djvusmooth.dependencies as __dependencies
      File "/usr/lib/python2.7/site-packages/djvusmooth/dependencies.py", line 70, in <module>
        _check_djvu()
      File "/usr/lib/python2.7/site-packages/djvusmooth/dependencies.py", line 45, in _check_djvu
        python_djvu_decode_version, ddjvu_api_version = djvu_decode_version.split('/')
    ValueError: need more than 1 value to unpack
    linux-tg2q:~ #
    If djvusmooth works properly is it possible to create a text file which contains filename and its page count.

    Please suggest how to save the page count information including file name of all the djvu files and store the information in text file.
    Regards,
    Rupesh.

  2. #2

    Default Re: How to count number of pages of all djvu files containing in a directory and store in text file.

    why would you convert pdf files to djvu
    djvu is used for scanned documents it is an image format not a document format
    that being said your first script has a strange cd path and is missing ; at the end of each command
    try
    Code:
    cd /run/media/root/Others/temp/converted\ djvu/ttd/Home/Download/
    for f in *.djvu
    do
        echo "file name:  $f";
        djvused -e n $f > output.txt;
    done
    your second command looks like a developer bug or missing library with djvusmooth ask at the developers page
    https://github.com/jwilk/djvusmooth/issues

  3. #3
    Join Date
    Aug 2011
    Location
    India
    Posts
    205

    Default Re: How to count number of pages of all djvu files containing in a directory and store in text file.

    2000 PDF files which I have converted consists of only scanned images without any text and so I have converted. I have converted whether to see djvu files are at acceptable quality or not.

    When I executed the above script it has given page numbers of some files but for some files it gave errors.
    Regards,
    Rupesh.

  4. #4
    Join Date
    Aug 2011
    Location
    India
    Posts
    205

    Default Re: How to count number of pages of all djvu files containing in a directory and store in text file.

    As you suggested I copied the directory to another place so that it may not look strange and I placed ; at the end of commands and the script I modified is as below

    Code:
    cd /run/media/root/Others/Download
    for f in *.djvu
    do
        echo "file name:  $f";
        djvused -e n $f > output.txt;
    done
    I have executed the above script but even no use I mean it is displaying manual page for maximum files.when i use djvused for single file it is working properly but when I use in scripts it is not working.

    for your reference I am providing part of output of the ls command executed on the directory I want to count as below
    Code:
    linux-tg2q:/run/media/root/Others/Download # pwd
    /run/media/root/Others/Download
    
    108 Vaishnavite Divya Desams Vol 1.djvu
    108 Vaishnavite Divya Desams Vol 2.djvu
    108 Vaishnavite Divya Desams Vol 3.djvu
    108 Vaishnavite Divya Desams Vol 5.djvu
    108 Vaishnavite Divya Desams Vol 6.djvu
    108 Vaishnavite Divya Desams Vol 7.djvu
    A Day A Divine Thought.djvu
    A Glossary Of Philosophical Terms.djvu
    A History Of Tirupati Vol I Second Edition.djvu
    A M V Narasimhacahryula Chaturdi.djvu
    A Monograph On Sri Tyagaraja Swamys Ghana Raga Pancharatna Keertanas.djvu
    A Study In Spirituality And Human Development Resource.djvu
    A Study Of The Compositions Of Purandaradasa And Tyagaraja.djvu
    A Synopsis Of Srimath Bhagavatam Vol I Skandas I to VIII.djvu
    I am suspecting that djvused is not detecting djvu files because filenames consists of spaces. Is it true.

    Regarding djvusmooth is there any necessity to update python to higher version because when I examine errors I am able to see lines containing unable to load .py.
    Regards,
    Rupesh.

  5. #5
    Join Date
    Jun 2008
    Location
    Groningen, Netherlands
    Posts
    16,725
    Blog Entries
    13

    Default Re: How to count number of pages of all djvu files containing in a directory and store in text file.

    Why don't you get the info from the pdf's ?

    Code:
    pdfinfo FILENAME.pdf | grep -i pages
    ° Appreciate my reply? Click the star and let me kow why.

    ° Perfection is not gonna happen. No way.

    http://en.opensuse.org/User:Knurpht
    http://nl.opensuse.org/Gebruiker:Knurpht

  6. #6
    Join Date
    Aug 2011
    Location
    India
    Posts
    205

    Default Re: How to count number of pages of all djvu files containing in a directory and store in text file.

    I am glad to say that I have succeeded and I am going to explain the process.

    As I have guessed pdfinfo and djvused doesn't work with filenames containing spaces and so I have replaced spaces with character '_' of all pdf files and djvu files. After that I have used two scripts each for pdfs and djvus and those are

    Code:
    cd /run/media/root/Others/temp/djvus2
    for f in *.djvu
    do
        echo "file name:  $f";
        djvused -e n $f;
    done
    Code:
    cd /run/media/root/Source/temp/pdfs2
    
    for f in *.pdf
    do
        echo "file name:  $f";
        pdfinfo $f; 
    done
    I run the above scripts with the following pattern

    Code:
    ./count_pdf.sh > pdf_output.txt 2> pdf_errors.txt
    ./count_djvu.sh > djvu_output.txt 2>> djvu_errors
    Upon executing the above scripts two huge text files named pdf_output.txt and djvu_output.txt are generated. After that I have used grep to extract filenames and page count of the two files and after that I compared them using diff tool. Upto 95 percent of pdf files are converted successfully.

    pdfinfo gave four errors randomly for upto 150 files and I am providing them below

    Code:
    Syntax Error: Expected the optional content group list, but wasn't able to find it, or it isn't an Array
    Syntax Error: Marked object is wrong type (boolean)
    Syntax Warning: Invalid least number of objects reading page offset hints table
    Syntax Warning: Invalid number of shared object groups
    djvused gave no errors at all.

    May I know the meaning of the pdfinfo errors.

    Thanks to knurpht and I_A for giving valuable suggestions.
    Regards,
    Rupesh.

  7. #7

    Default Re: How to count number of pages of all djvu files containing in a directory and store in text file.

    the pdfinfo errors are related to bad pdf files usually during pdf generation you could fix them but as you have thousands of files that's too much work if the files render using a pdf viewer leave them alone the internal errors can usually be ignored

  8. #8
    Join Date
    Aug 2011
    Location
    India
    Posts
    205

    Default Re: How to count number of pages of all djvu files containing in a directory and store in text file.

    Quote Originally Posted by I_A View Post
    the pdfinfo errors are related to bad pdf files usually during pdf generation you could fix them but as you have thousands of files that's too much work if the files render using a pdf viewer leave them alone the internal errors can usually be ignored
    Thanks for your suggestion and I am going to neglect those errors.
    Regards,
    Rupesh.

  9. #9

    Default Re: How to count number of pages of all djvu files containing in a directory and store in text file.

    Quote Originally Posted by rupeshforu3 View Post
    I am glad to say that I have succeeded and I am going to explain the process.

    As I have guessed pdfinfo and djvused doesn't work with filenames containing spaces and so I have replaced spaces with character '_' of all pdf files and djvu files. After that I have used two scripts each for pdfs and djvus and those are

    Code:
    cd /run/media/root/Others/temp/djvus2
    for f in *.djvu
    do
        echo "file name:  $f";
        djvused -e n $f;
    done
    Code:
    cd /run/media/root/Source/temp/pdfs2
    
    for f in *.pdf
    do
        echo "file name:  $f";
        pdfinfo $f; 
    done
    I run the above scripts with the following pattern

    Code:
    ./count_pdf.sh > pdf_output.txt 2> pdf_errors.txt
    ./count_djvu.sh > djvu_output.txt 2>> djvu_errors
    Upon executing the above scripts two huge text files named pdf_output.txt and djvu_output.txt are generated. After that I have used grep to extract filenames and page count of the two files and after that I compared them using diff tool. Upto 95 percent of pdf files are converted successfully.

    pdfinfo gave four errors randomly for upto 150 files and I am providing them below

    Code:
    Syntax Error: Expected the optional content group list, but wasn't able to find it, or it isn't an Array
    Syntax Error: Marked object is wrong type (boolean)
    Syntax Warning: Invalid least number of objects reading page offset hints table
    Syntax Warning: Invalid number of shared object groups
    djvused gave no errors at all.

    May I know the meaning of the pdfinfo errors.

    Thanks to knurpht and I_A for giving valuable suggestions.
    Hi,

    Because you did not quote the variable $f, always use "$f" (inside double quotes) to prevent word splitting and to avoid the expansion of special characters that the shell might interpret.

    Code:
    cd /run/media/root/Others/temp/djvus2
    for f in *.djvu
    do
        echo "file name:  $f";
        djvused -e n "$f"
    done
    Code:
    cd /run/media/root/Source/temp/pdfs2
    
    for f in *.pdf
    do
        echo "file name:  $f"
        pdfinfo "$f" 
    done
    "Unfortunately time is always against us" -- [Morpheus]

  10. #10
    Join Date
    Aug 2011
    Location
    India
    Posts
    205

    Default Re: How to count number of pages of all djvu files containing in a directory and store in text file.

    Quote Originally Posted by jetchisel View Post
    Hi,

    Because you did not quote the variable $f, always use "$f" (inside double quotes) to prevent word splitting and to avoid the expansion of special characters that the shell might interpret.

    Code:
    cd /run/media/root/Others/temp/djvus2
    for f in *.djvu
    do
        echo "file name:  $f";
        djvused -e n "$f"
    done
    Code:
    cd /run/media/root/Source/temp/pdfs2
    
    for f in *.pdf
    do
        echo "file name:  $f"
        pdfinfo "$f" 
    done
    Does your idea pertains to all cases in using scripting and terminal. I have a text file which consists of file names with spaces and special charecters like % and I want to create files in the current directory. Is it possible to create files without any modifications I mean if there are 1000 lines in text file I want to create files same as in a text file.

    Previously I have placed touch " at beginning and again " at last. The lines are in the pattern as below
    touch "file name.mp3" and after that I have copied all lines to a new shell script and named it as create_directories.sh. when I executed that script I got a number of errors.

    I am requesting you to try as below and let me know how to create files with white spaces.
    Code:
    touch "one two three"
    Regards,
    Rupesh.

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •