Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: path names in a PERL-script -

  1. #1

    Default path names in a PERL-script -

    Hi all - hello Community,

    i am new to Linux and new to PERL too. I am trying to get this perl script up and running. I have installed OpenSuse 11.3

    What is wanted: I have a bunch of HTML-files, stored in a folder.
    with the Perl-Script (see below) i want to parse the HTML-files.

    I have stored the script to the following place:

    Basisfolder [german!!] > user > perl >

    My question is - how to name the paths ...

    a. to the html-folder that contains the HTML-files that need to be parsed (i named this folder html.files)
    b. how to name the file that has to be created...


    i suggest that this files also is located in the same directory: Basisfolder [german!!] > user > perl >

    guess that this makes it easy...


    Please do not bear with me for the Noob-Questions. If i have to explain more - please let me know!


    Love to hear from your - Many thanks in advance for any and all help.

    dilbertone!


    see here the code...

    PHP Code:
    #!/usr/bin/perl
    use strict;
    use 
    warnings;
    use 
    diagnostics;
    use 
    HTML::TokeParser;

    # my $file = 'school.html';

    my @html_files File::Find::Rule->file->name'*.html.files' )->in$html_dir );
    my $p HTML::TokeParser->new($file) or die "Can't open: $!";

    my %school;
    while (
    my $tag $p->get_tag('div''/html')) {
            
    # first move to the right div that contains the information
            
    last if $tag->[0eq '/html';
            
    next unless exists $tag->[1]{'id'} and $tag->[1]{'id'eq 'inhalt_large';
            
            
    $p->get_tag('h1');
            
    $school{'location'} = $p->get_text('/h1');
            
            while (
    my $tag $p->get_tag('div')) {
                    
    last if exists $tag->[1]{'id'} and $tag->[1]{'id'eq 'fusszeile';
                    
                    
    # get the school name from the heading
                    
    next unless exists $tag->[1]{'class'} and $tag->[1]{'class'eq 'fm_linkeSpalte';
                    
    $p->get_tag('h2');
                    
    $school{'name'} = $p->get_text('/h2');
                    
                    
    # verify format for school type
                    
    $tag $p->get_tag('span');
                    
    unless (exists $tag->[1]{'class'} and $tag->[1]{'class'eq 'schulart_text') {
                            
    warn "unexpected format: parsing stopped";
                            
    last;
                    }
                    
    $school{'type'} = $p->get_text('/span');
                    
                    
    # verify format for address
                    
    $tag $p->get_tag('p');
                    
    unless (exists $tag->[1]{'class'} and $tag->[1]{'class'eq 'einzel_text') {
                            
    warn "unexpected format: parsing stopped";
                            
    last;
                    }
                    
    $school{'address'} = clean_address($p->get_text('/p'));
                    
                    
    # find the description
                    
    $tag $p->get_tag('p');
                    
    $school{'description'} = $p->get_text('/p');
            }
    }

    print 
    qq/$school{'name'}\n/;
    print 
    qq/$school{'location'}\n/;
    print 
    qq/$school{'type'}\n/;

    foreach (@{
    $school{'address'}}) {
            print 
    "$_\n";
    }

    print 
    qq/\nDescription$school{'description'}\n/;

    sub clean_address {
            
    my $text shift;
            
    my @lines split "\n"$text;
            foreach (@
    lines) {
                    
    s/^\s+//;
                    
    s/\s+$//;
            
    }
            return \@
    lines;

    Love to hear from you!

  2. #2
    Join Date
    Jan 2009
    Location
    43.009 N, 73.172 W
    Posts
    211

    Default Re: path names in a PERL-script -

    I'm not sure exactly what you're asking in your question, but I'll give an answer based on what I think your question might be.

    If you want to parse a set of filenames into an array, open a directory and read the contents of that directory into the array. In Linux, a folder is called a directory.

    Code:
    opendir (THISDIR, $HOME) or warn "Could not open the dir ".$HOME.": $!";
    @allfiles = grep !/^\.\.?$/, readdir THISDIR;
    closedir THISDIR;
    The opendir directive opens the contents of $HOME into the array @allfiles. You can then print the array to STDOUT to test that it is true.

    $HOME can be declared at the top of your program like this:
    Code:
    my $HOME ="/home/username/htmlfiles";
    #or whatever your path actually is.

    I hope this helps.
    Box: Home Built | Intel Core2 @2.4 GHz | 6 GB | OpenSUSE 11.4| KDE 4.6.0 r6| nVidia GeForce 7300 GT

  3. #3

    Default Re: path names in a PERL-script -

    hello Udaman - many thanks for the quick reply.

    my question is regarding the I-O handle. I have to find the right path names. Names and conventions that match the linux conventions...

    i took your example and made some slight corrections...

    Quote Originally Posted by udaman View Post
    I'm not sure exactly what you're asking in your question, but I'll give an answer based on what I think your question might be.

    If you want to parse a set of filenames into an array, open a directory and read the contents of that directory into the array. In Linux, a folder is called a directory.

    Code:
    opendir (THISDIR, $HOME) or warn "Could not open the dir ".$HOME.": $!";
    @allfiles = grep !/^\.\.?$/, readdir THISDIR;
    closedir THISDIR;
    The opendir directive opens the contents of $HOME into the array @allfiles. You can then print the array to STDOUT to test that it is true.

    $HOME can be declared at the top of your program like this:
    Code:
    my $HOME ="/home/username/htmlfiles";
    #or whatever your path actually is.

    I hope this helps.
    i wrote this

    perl_script_three.pl

    PHP Code:
    #!/usr/bin/perl

    use strict;

    use 
    warnings;

    use 
    diagnostics;

    use 
    File::Find::Rule;


    my $HOME ="home/usr/perl/html.files";
    opendir (THISDIR$HOME) or warn "Could not open the dir ".$HOME.": $!";
    @
    allfiles grep !/^\.\.?$/, readdir THISDIR;
    closedir THISDIR
    response:

    suse-linux:/usr/perl # perl perl_script_three.pl
    Global symbol "@allfiles" requires explicit package name at perl_script_three.pl line 10.
    Execution of perl_script_three.pl aborted due to compilation errors (#1)
    (F) You've said "use strict" or "use strict vars", which indicates
    that all variables must either be lexically scoped (using "my" or "state"),
    declared beforehand using "our", or explicitly qualified to say
    which package the global variable is in (using "::").

    Uncaught exception from user code:
    Global symbol "@allfiles" requires explicit package name at perl_script_three.pl line 10.
    Execution of perl_script_three.pl aborted due to compilation errors.
    at perl_script_three.pl line 12
    suse-linux:/usr/perl #
    i am not sure - have i done something wrong!?

    Any and all help is greatly appreciated

    dilbertone

  4. #4
    Join Date
    Jun 2008
    Location
    UTC+10
    Posts
    9,941
    Blog Entries
    4

    Default Re: path names in a PERL-script -

    When you use strict; you must declare all variables before use instead of relying on Perl to let you create them on first use, which could hide errors in the program. The quickest fix is to add my in front of the first @allfiles, i.e.

    my @allfiles = grep !/^..?$/, readdir THISDIR;

  5. #5
    Join Date
    Jan 2009
    Location
    43.009 N, 73.172 W
    Posts
    211

    Default Re: path names in a PERL-script -

    ken_yap is correct by adding the 'my' in front of the array. I usually define all my variables, including arrays at the beginning of the program, and so when I cut and pasted that snippet of code, the definition wasn't evident. As you defined the 'my $HOME' variable you should have defined the 'my @allfiles' variable, but only the first time it's used.

    Are you trying to split out the path of a given file? Like a file lives in "/home/usr/perl/html.files/file1.html". You can pop that path into an array, split it on the last slash (/), and drop the file name, leaving the path to it in the array. Or find a Perl module that does the work for you.
    Box: Home Built | Intel Core2 @2.4 GHz | 6 GB | OpenSUSE 11.4| KDE 4.6.0 r6| nVidia GeForce 7300 GT

  6. #6

    Default Re: path names in a PERL-script -

    Hello again you both - many thanks for the great and supportive help!



    i have reworked the two scripts that are being created in order to find the right paths but i have no luck. The scripts are placed in

    home > usr > perl

    i have the two scripts
    a. perl_script_two.pl
    b. perl_script_three.pl

    And there also is the directory with the 20000 html-files. See further above.

    Well i found out that i made some mistakes while talking bout the html-files: Note: there are more than 20 000 Html files in the directory that is called htmlfiles Note i renamed it to htmlfiles - instead of html.files
    Imortant: But the files itself are all named like the following sheme:

    einzelergebnis1...
    einzelergebnis2...
    einzelergebnis3a...
    einzelergebnis3b...
    einzelergebnis3d...

    You can see this in a consequent regard in the script two: Here i name it accordingly...

    PHP Code:
    my @files File::Find::Rule->file()
                     ->
    name('einzelergebnis*.html'

    So here we go: i start them in the console like the following and get the following results: see below!

    suse-linux:/usr/perl # perl perl_script_two.pl

    PHP Code:
    #!/usr/bin/perl
    use strict;
    use 
    warnings;
    use 
    diagnostics;
    use 
    File::Find::Rule;
    my @files File::Find::Rule->file()
                     ->
    name('einzelergebnis*.html')
                     ->
    in'/home/usr/perl/htmlfiles' );
    foreach 
    my $file(@files) {
            print 
    $file"\n";


    Results:
    Can't stat /home/usr/perl/htmlfiles: No such file or directory
    at /usr/lib/perl5/site_perl/5.12.1/File/Find/Rule.pm line 594



    perl_script_three.pl

    suse-linux:/usr/perl # perl perl_script_three.pl

    PHP Code:
    #!/usr/bin/perl
    use strict;
    use 
    warnings;
    use 
    diagnostics;
    use 
    File::Find::Rule;

    my $HOME ="/home/usr/perl/htmlfiles";
    opendir (THISDIR$HOME) or warn "Could not open the dir ".$HOME.": $!";
    my @allfiles grep !/^\.\.?$/, readdir THISDIR;
    closedir THISDIR

    Results:
    Could not open the dir /home/usr/perl/htmlfiles: No such file or directory at perl_script_three.pl line 9.
    readdir() attempted on invalid dirhandle THISDIR at perl_script_three.pl line
    10 (#1)
    (W io) The dirhandle you're reading from is either closed or not really
    a dirhandle. Check your control flow.
    closedir() attempted on invalid dirhandle THISDIR at perl_script_three.pl line
    11 (#2)
    (W io) The dirhandle you tried to close is either closed or not really
    a dirhandle. Check your control flow.


    so i am a bit clueless -


    Perhaps i should take a more simple script for these preliminary tests..

    Look forward to hear from you!

    regards
    dilbertone

  7. #7
    Join Date
    Jan 2009
    Location
    43.009 N, 73.172 W
    Posts
    211

    Default Re: path names in a PERL-script -

    Did you read the error message? In both cases, the error message is saying it can't find the directory. Sounds like it doesn't exist. Check that /home/usr/perl/htmlfiles/ is exactly as you think it is. Post here the output of
    Code:
    ls -d /home/usr/perl/htmlfiles
    Box: Home Built | Intel Core2 @2.4 GHz | 6 GB | OpenSUSE 11.4| KDE 4.6.0 r6| nVidia GeForce 7300 GT

  8. #8

    Default Re: path names in a PERL-script -

    Hello udaman! good evening!


    many thanks to you! I did as you adviced me! I will post those results later this evening!!

    in the meantime i will provide you with some first results i have gained so far:

    you remeber the script that i have introduced further above: (see also below)

    i tried replacing the "in" to look in the same directory as the script assuming it's in the same directory ->in( '.' );

    That means: i changed from ...

    PHP Code:
    #!/usr/bin/perl
    use strict;
    use 
    warnings;
    use 
    diagnostics;
    use 
    File::Find::Rule;
    my @files File::Find::Rule->file()
                     ->
    name('einzelergebnis*.html')
                     ->
    in'/home/usr/perl/htmlfiles' );
    foreach 
    my $file(@files) {
            print 
    $file"\n";


    to this

    PHP Code:
    #!/usr/bin/perl
    use strict;
    use 
    warnings;
    use 
    diagnostics;
    use 
    File::Find::Rule;
    my @files File::Find::Rule->file()
                     ->
    name('einzelergebnis*.html')
                     ->
    in'.' );
    foreach 
    my $file(@files) {
            print 
    $file"\n";



    and then i got the following output:

    htmlfiles/einzelergebnis80b5.html
    htmlfiles/einzelergebnisa0ef.html
    htmlfiles/einzelergebnis1b42.html
    htmlfiles/einzelergebnis5960.html
    htmlfiles/einzelergebnise523.html
    htmlfiles/einzelergebnis2c7e.html
    htmlfiles/einzelergebnisdf57.html
    htmlfiles/einzelergebnis2b53-2.html
    htmlfiles/einzelergebnisb1c0-2.html
    ....and 22 thousand lines further... ;-)
    This seems to be the starting point! now i can continue figuring out how i have to configure the script of - see more below. So after having nailed down the I-O handle-issues and the path names in General the parser-script (see below) has to be configured. All following ideas should be regarding this HTML-parser-script:

    Well, this means i have

    a. to define the paths in $file the file/directory incl. path and furthermore ...
    b. to define a path in $html_dir

    In other words - i need to define the paths to

    a. the directory that contains the files that need to be parsed - see above.
    b. the path to the file that has to be created.

    The first task can be solved if i take some gained knowledge out of the preliminary-tasks - see above.
    That means: i have to look for the files in the directory that is called "htmlfiles"

    Does that mean i have to change this following line!?
    PHP Code:
     my $file 'school.html'
    BTW – what does the Array @html_files do?

    PHP Code:
    #!/usr/bin/perl
    use strict;
    use 
    warnings;
    use 
    HTML::TokeParser;

    my $file 'school.html';
    my $p HTML::TokeParser->new($file) or die "Can't open: $!";

    my %school;
    while (
    my $tag $p->get_tag('div''/html')) {
        
    # first move to the right div that contains the information
        
    last if $tag->[0eq '/html';
        
    next unless exists $tag->[1]{'id'} and $tag->[1]{'id'eq 'inhalt_large';
        
        
    $p->get_tag('h1');
        
    $school{'location'} = $p->get_text('/h1');
        
        while (
    my $tag $p->get_tag('div')) {
            
    last if exists $tag->[1]{'id'} and $tag->[1]{'id'eq 'fusszeile';
            
            
    # get the school name from the heading
            
    next unless exists $tag->[1]{'class'} and $tag->[1]{'class'eq 'fm_linkeSpalte';
            
    $p->get_tag('h2');
            
    $school{'name'} = $p->get_text('/h2');
            
            
    # verify format for school type
            
    $tag $p->get_tag('span');
            
    unless (exists $tag->[1]{'class'} and $tag->[1]{'class'eq 'schulart_text') {
                
    warn "unexpected format: parsing stopped";
                
    last;
            }
            
    $school{'type'} = $p->get_text('/span');
            
            
    # verify format for address
            
    $tag $p->get_tag('p');
            
    unless (exists $tag->[1]{'class'} and $tag->[1]{'class'eq 'einzel_text') {
                
    warn "unexpected format: parsing stopped";
                
    last;
            }
            
    $school{'address'} = clean_address($p->get_text('/p'));
            
            
    # find the description
            
    $tag $p->get_tag('p');
            
    $school{'description'} = $p->get_text('/p');
        }
    }

    print 
    qq/$school{'name'}\n/;
    print 
    qq/$school{'location'}\n/;
    print 
    qq/$school{'type'}\n/;

    foreach (@{
    $school{'address'}}) {
        print 
    "$_\n";
    }

    print 
    qq/\nDescription$school{'description'}\n/;

    sub clean_address {
        
    my $text shift;
        
    my @lines split "\n"$text;
        foreach (@
    lines) {
            
    s/^\s+//;
            
    s/\s+$//;
        
    }
        return \@
    lines;


    i look forward to any and all help! I really appreciate a helping hand here... Many many thanks for all you did so far! This is a great place for knowledge sharing!!

    metabo

  9. #9

    Default Re: path names in a PERL-script -

    Hello Udaman

    regarding your question - here an answer:

    suse-linux:/usr/perl # cd /usr^C
    suse-linux:/usr/perl # ls -d /home/usr/perl/htmlfiles
    ls: cannot access /home/usr/perl/htmlfiles: No such file or directory
    suse-linux:/usr/perl #

    the same is to this command:
    ls -al /home/usr/perl/htmlfiles/


    Udaman: what does this mean to the following task - the task to configure the HTML-Parser script? (see above!)

    I need to define the paths in $file the file/directory incl. path and furthermore to define a path in $html_dir


    It is a bit confusing! did i have done something wrong !? Why do i get such results...


    suse-linux:/usr/perl # cd /usr^C
    suse-linux:/usr/perl # ls -d /home/usr/perl/htmlfiles
    ls: cannot access /home/usr/perl/htmlfiles: No such file or directory
    suse-linux:/usr/perl #


    i do not understand this

    regards dilbert

  10. #10
    Join Date
    Jan 2009
    Location
    43.009 N, 73.172 W
    Posts
    211

    Default Re: path names in a PERL-script -

    The answer is very simple. The directory that you have the script in "." is not the same directory that your script is looking in "/home/usr/perl/htmlfiles". That directory doesn't exit, that's what the error message is talking about. Either create the directory and move the files there, or use the path that the files are in.

    When you are in the same directory that the script is in, do "pwd", and that will give you the correct path to the files. Use it in your script. Before you continue with your Perl class, you should take a class in basic Unix/Linux commands.
    Box: Home Built | Intel Core2 @2.4 GHz | 6 GB | OpenSUSE 11.4| KDE 4.6.0 r6| nVidia GeForce 7300 GT

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •