Perl Mechanize: how to process a simple loop [plz review my approach]

dilbertone · May 18, 2011, 9:03pm

hello CommunitY!

i am heading for Perl-programming. I want to learn something

Well i am currently working on a small solution: I have tried various tutorials (examples of Mechanize - that i have found on the CPAN) not oll of them work - some of them are broken!

Now i try t o get some real-world-task!

Especially interesting for me as a PHP/Perl-beginner is this site in Switzerland:
see this link - click on it and see more

Which has a dataset of 2700 foundations. All the data are free to use with no limitations copyrights on it.

**what we have so far: ** Well the harvesting task should be no problem if i take WWW::Mechanize - particularly for doing the form based search and selecting the individual entries.
Hmm - i guess that the algorithm would be basically 2 nested loops: the outer loop runs the form based search, the inner loop processes the search results.

The outer loop would use the select() and the submit_form() functions
on the second search form on the page. Can we use DOM processing here.

Well - how can we get the get the selection values.

The inner loop through the results would use the follow link function to get to the actual entries using the following call.


$mech->follow_link(url_regex => qr/webgrab_path=http:\/\/evs2000.*\?
Id=\d+$/, n => $result_nbr);

This would forward our mechanic browser to the entry page. Basically the URL query looks for links that have the webgrap_path to Id pattern, which is unique for each database entry. The $result_nbr variable tells mecha which one of the results it should follow next.

If we have several result pages we would also use the same trick to traverse through the result pages.
For the semantic extraction of the entry information,we could parse the content of the actual entries with XML:LibXML’s html parser (which works fine on this page), because it gives you some powerful DOM selection (using XPath) methods.
Well the actual looping through the pages should be doable in a few lines of perl of max. 20 lines - likely less.

But wait: the processing of the entry pages will then be the most complex part
of the script.

Approaches: In principle we could do the same algorithm with a single while loop
if we use the back() function smartly.

Can you give me a hint for the beginning - the processing of the entry pages - doing this in Perl:: Mechanize

Look forward to hear from you

regards
db1

stamostolias · May 18, 2011, 9:09pm

Hello Unfortunately I can not help you so much. I am programmer but I develop in C++, Fortran, Qt and with linux Kernel.
Anyway I will try to find something for you.
Read this first The Perl Programming Language - www.perl.org
Beginning Perl (free) - www.perl.org
Perl - Wikipedia, the free encyclopedia

dilbertone · May 18, 2011, 9:32pm

hello dear stamostolias - many many thanks for the quick reply. Great to hear from you!

Well i will dig deeper into those readinds… Besides this good readings… if you can give me some additional hints i would be more than happy…

many many greetings to Greece!!
dilbertone!!

stamostolias · May 18, 2011, 9:34pm

I will try to find something good for you.
Thank you.

If you want something about programming and developing. Send me PM or if you want to learn a programming language: C++, Fortran, Qt. Do not worry.

dilbertone · May 19, 2011, 12:28am

Good moring dear stamostolias! καλημέρα!

many thanks to you! This is such a great place to be! I am very happy!

Have a great day.

Look forward to hear from you!

Many greetings to you dear friend in Athens /Greece !!

dilbertone

stamostolias · May 19, 2011, 9:40am

So and I am happy to give help and this problem has solved. I will send you another files from that. Anyway how did you decide to learn Perl programming language?? It would be better to start with a object oriented programming language.

dilbertone · May 19, 2011, 10:45am

hi Stamos -

thx for replying!! quite busi at the moment - come back later today.

warm regards
matze / aka db1

djh-novell · May 19, 2011, 12:12pm

stamostolias wrote:
> So and I am happy to give help and this problem has solved. I will send
> you another files from that. Anyway how did you decide to learn Perl
> programming language?? It would be better to start with a object
> oriented programming language.

Lets have a religious war

Perl is an object-oriented language

http://perldoc.perl.org/search.html?q=object

Cheers, Dave

PS dilbert1, PHP is not the same thing as Perl. Don’t confuse the two.

stamostolias · May 19, 2011, 5:08pm

It is a object oriented language, but in my opinion it will be better to start with the basis. I mean C,C++ etc.

What do you mean Religious War??

dilbertone · May 19, 2011, 7:33pm

hello you both! hello Stamos and djh-Novell,

great to read your posts;-) Well i guess that djh-novell does not want to start a _real-religious war! Programmers like their languages for specific choices and in regard of specific pros (and cons).

I have had a look at Python. i like it because it is very easy to read. Stamos you are right - or let me say: i can understand you in your ideas regarding learning programming languages…

i had a look at some languages: Well i think, Python is quite a very nice language: one thing I’ve noticed between programming in perl and python. Its some times easy to write a perl program, which is why they’re pretty much a dime a dozen. But reading one on the other hand…it can be so painful. So if i have time left - i am start to learn Python. Python seems like a good choice: it’s easy to read,
fairly powerful and, compared to some other languages (such as C, C++ and Java),
and it doesn’t seem too complicated.

But back to the problem - written above. I have some difficulties writing the code in Mechanize to get the entrypage…

see here see this link here - and the page - a switzerland page… with a database with more than 2700 records

Can you (both) give me a hint for the beginning - the processing of the entry pages - doing this in Perl::Mechanize or any other language…?

Look forward to hear from you both…

greetings from Germany
dillbertone / matze:)

stamostolias · May 19, 2011, 7:46pm

dilbertone:

hello you both! hello Stamos and djh-Novell,

great to read your posts;-) Well i guess that djh-novell does not want to start a _real-religious war! Programmers like their languages for specific choices and in regard of specific pros (and cons).

I have had a look at Python. i like it because it is very easy to read. Stamos you are right - or let me say: i can understand you in your ideas regarding learning programming languages…

i had a look at some languages: Well i think, Python is quite a very nice language: one thing I’ve noticed between programming in perl and python. Its some times easy to write a perl program, which is why they’re pretty much a dime a dozen. But reading one on the other hand…it can be so painful. So if i have time left - i am start to learn Python. Python seems like a good choice: it’s easy to read,
fairly powerful and, compared to some other languages (such as C, C++ and Java),
and it doesn’t seem too complicated.

But back to the problem - written above. I have some difficulties writing the code in Mechanize to get the entrypage…

see here see this link here - and the page - a switzerland page… with a database with more than 2700 records

Can you (both) give me a hint for the beginning - the processing of the entry pages - doing this in Perl::Mechanize or any other language…?

Look forward to hear from you both…

greetings from Germany
dillbertone / matze:)

If you want guides for python I have written in Greek sub forum.

dilbertone · May 20, 2011, 12:38am

hello Stamos & hello to all Perlers,

besides the fact that my harvester also can be created with BeautifulSoup - a Python library i still stick with perl and got stuck within the programme-logic

… regarding the issue with the little harvester - running mechanize:

see the target: see the link - click here

i still need to have some hints for the beginning - the processing of the entry pages - doing this in Perl:: Mechanize

.eg. like this… :

 GetThePage(
    starting url 
);
sub GetThePage {
    my $mech ...
    my @pages = ...
    while(@pages) {
       my $page = shift @pages;
       $mech->get( $page );
       push @pages, GetMorePages( $mech );
       SomethingImportant( $mech );
       SomethingXPATH( $mech );
    }
}

Well - the question is - how to find the DOM-paths

any ideas…

djh-novell · May 20, 2011, 11:25am

stamostolias wrote:
> What do you mean Religious War??

Sorry, I forget cultural translation problems. It’s an expression used
in [Anglo-Saxon at least] computing circles to mean an argument that is
based mainly on personal opinions, and that therefore no-one can win and
which will become extremely heated and personal. A flame-fest!

An example would be which is better, vi or emacs? Or practically any
comparison of programming language A against programming language B.

yasar11732 · May 20, 2011, 12:59pm

Of course vi

P.S sorry couldn’t stop myself saying that…

stamostolias · May 20, 2011, 1:22pm

There is no religious war but different solutions of same problem

djh-novell · May 20, 2011, 1:10pm

yasar11732 wrote:
> djh-novell;2342156 Wrote:
>> which is better, vi or emacs?
>
> Of course vi
>
> P.S sorry couldn’t stop myself saying that…

Well of course you’re right, so you’re not going to provoke me like that

stamostolias · May 20, 2011, 6:49pm

dilbertone:

hello Stamos & hello to all Perlers,

besides the fact that my harvester also can be created with BeautifulSoup - a Python library i still stick with perl and got stuck within the programme-logic

… regarding the issue with the little harvester - running mechanize:

see the target: see the link - click here

i still need to have some hints for the beginning - the processing of the entry pages - doing this in Perl:: Mechanize

.eg. like this… :
 GetThePage(
    starting url 
);
sub GetThePage {
    my $mech ...
    my @pages = ...
    while(@pages) {
       my $page = shift @pages;
       $mech->get( $page );
       push @pages, GetMorePages( $mech );
       SomethingImportant( $mech );
       SomethingXPATH( $mech );
    }
} 
Well - the question is - how to find the DOM-paths

any ideas…

Read this Firebug and DOM Exploration : Firebug
Also this http://cpansearch.perl.org/src/MIROD/XML-DOM-XPath-0.14/XPath.pm
this Perl-XML Module List
this XML::DOM::XPath

stamostolias · May 20, 2011, 7:11pm

dilbertone:

hello you both! hello Stamos and djh-Novell,

great to read your posts;-) Well i guess that djh-novell does not want to start a _real-religious war! Programmers like their languages for specific choices and in regard of specific pros (and cons).

I have had a look at Python. i like it because it is very easy to read. Stamos you are right - or let me say: i can understand you in your ideas regarding learning programming languages…

i had a look at some languages: Well i think, Python is quite a very nice language: one thing I’ve noticed between programming in perl and python. Its some times easy to write a perl program, which is why they’re pretty much a dime a dozen. But reading one on the other hand…it can be so painful. So if i have time left - i am start to learn Python. Python seems like a good choice: it’s easy to read,
fairly powerful and, compared to some other languages (such as C, C++ and Java),
and it doesn’t seem too complicated.

But back to the problem - written above. I have some difficulties writing the code in Mechanize to get the entrypage…

see here see this link here - and the page - a switzerland page… with a database with more than 2700 records

Can you (both) give me a hint for the beginning - the processing of the entry pages - doing this in Perl::Mechanize or any other language…?

Look forward to hear from you both…

greetings from Germany
dillbertone / matze:)

It is very easy my friend from Germany to learn programming, I had begun with Pascal(Pascal here in Greece is the basis of programming lessons), then I had learned C++(Now C++ is my main programming language who I use, I use it so much for developing), Qt is a application to build new applications and programs, it is like VB, but it is only for Linux and only for C++. Then I had learned fortran(Because I am very good in Mathematics I use it for mathematics applications) and python(I do not like so much, I use mainly C++ for all)

Barry_Nichols · May 21, 2011, 12:19am

Hi,

There is just so much wrong with this:

Qt is not an application, it is a user interface framework.
Qt is cross platform, this happens to be the main selling point, so not just for Linux.
Qt is written in C++, but it can be used with many programming languages.

Also, I’d be interested to learn why you compare Qt (a user interface framework) to a programming language such as Visual Basic.

–
Regards,
Barry Nichols

stamostolias · May 21, 2011, 9:04am

Sorry the difference of languages always confuse all. I use Qt only with C++ and only with that, Because Qt in Greek language is Πλατφόρμα Εφαρμογών I can not find a good word for it.

Also, I’d be interested to learn why you compare Qt (a user interface framework) to a programming language such as Visual Basic.
Sorry for this(My rush)
…Visual Studio.
Some times the same things in Greek are translated different in English.