good evening dear linux-experts,
first of all - i am very very happy that i have found this great place. I like this forum very very much, since it has a great and supportive community! I learn alot form you folks here! Each question has got some great reviewers and - each thread is a rich value and learning asset.
Well i am new to Perl - and fairly new to this board here: i am currently workin out a little parser: i want to parse a table. See the taget-url:
Note - This page has a table: well a table with vaules and lables.
We need to provide something that uniquely identifies the table in question. This can be the content of its headers or the HTML attributes. In this case, there is only one table in the document, so we don’t even need to do that. But, what about to provide anything to the constructor, I would provide the class of the table.
We do not want the columns of the table. The first column of this table consists of labels and the second column consists of values. To get the labels and values at the same time, we should process the table row-by-row.
Well - can this be done like so:
#!/usr/bin/perl
use strict; use warnings;
use HTML::TableExtract;
use YAML;
my $te = HTML::TableExtract->new(
attribs => { class => 'bp_ergebnis_tab_info' },
);
$te->parse_file('t.html');
for my $table ( $te->tables ) {
print Dump $table->columns;
}
Note i want to parse a site like this. Weitere Schulinformationen
so for a first trial i save the html of the page and try it out!
Can you review the code and give some hints…
love to hear from you
regards
db1