Transform a string to an array: with regex, split or explode: I need some help in a simple string!

OK, I tried it on a clean 11.4 install and it just worked…:confused:

I put a copy of my one up on my webserver, can you grab and try it please.


wget http://www.muppetwifi.homeunix.net/openSUSE/get_table_data.pl

hi there - many thx

just a quick note - (more later today)

on opensuse 11.3 (!!!) i get the following results:




    lfd. Nr. Schul- nummer Schulname Stra�e PLZ Ort Telefon Fax Schulart Webseite
    1 0401 M�dchenrealschule Marienburg,�Abenberg, der Di�zese Eichst�tt Marienburg 1 91183� Abenberg�  09178/509210  Realschulen  mrs-marienburg.homepage.t-online.de
    2 6581 Volksschule Abenberg�(Grundschule) G�ss�belstr. 2 91183� Abenberg�  09178/215 09178/905060 Volksschulen  home.t-online.de/home/vs-abenberg
    3 6913 Mittelschule Abenberg� G�ss�belstr. 2 91183� Abenberg�  09178/215 09178/905060 Volksschulen  home.t-online.de/home/vs-abenberg
    4 0402 Johann-Turmair-Realschule�Staatliche Realschule Abensberg Stadionstra�e 46 93326� Abensberg�  09443/9143-0,12,13 09443/914330 Realschulen  www.rs-abensberg.de
    5 3041 Cabrini-Schule Offenstetten, Priv. F�rderzentrum�F�rderschwerp. geist.Entwickl. d. Kath.Jugendf�rs. Am Schmiedweiher 8 93326� Abensberg�Offenstetten 09443/9188-3 09443/918855 Volksschulen zur sonderp�dog. F�rderung  www.cabrinischule.de
    6 3074 Private Berufsschule zur sonderp�d. F�rderung,�F�rderschwerpunkt Lernen, Abensberg Regensburger Stra�e 60 93326� Abensberg�  09443/709191 09443/709193 Berufsschulen zur sonderp�dog. F�rderung  www.berufsschule-abensberg.de
    7 3083 Pr�lat-Michael-Thaller-Schule, Priv. Sonderp�d.�F�rderzetrum Abensberg der Kath. Jugendf. Regensb. Regensburger Str. 58 93326� Abensberg�  09443/928500 09443/92850300 Volksschulen zur sonderp�dog. F�rderung  www.pmt-schule.de
    8 3253 Priv.Fachschule f.Heilerziehungspflege u. -pflege-�hilfe Abensberg d.Kath.Jugendf�rsorge d.Di�z.Reg. An den Sandwellen 124 93326� Abensberg�  09443/709174 09443/709-222 Hauswirtschaftliche Fachschulen  www.bbw-abendberg.de

with this code:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use HTML::TableExtract;
    use LWP::Simple;
    use Cwd;
    use POSIX qw(strftime);


    my $te = HTML::TableExtract->new;
    my $total_records = 0;
    my $suchbegriffe = "e";
    my $treffer = 50;
    my $range = 0;
    my $url_to_process = "http://192.68.214.70/km/asps/schulsuche.asp?q=";
    my $processdir = "processing";
    my $counter = 50;
    my $displaydate = "";
    my $percent = 0;

    &workDir();
    chdir $processdir;
    &processURL();
    print "
Press <enter> to continue
";
    <>;
    $displaydate = strftime('%Y%m%d%H%M%S', localtime);
    open OUTFILE, ">webdata_for_$suchbegriffe\_$displaydate.txt";
    &processData();
    close OUTFILE;
    print "Finished processing $total_records records...
";
    print "Processed data saved to $ENV{HOME}/$processdir/webdata_for_$suchbegriffe\_$displaydate.txt
";
    unlink 'processing.html';
    die "
";

    sub processURL() {
    print "
Processing $url_to_process$suchbegriffe&a=$treffer&s=$range
";
    getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'tempfile.html') or die 'Unable to get page';

       while( <tempfile.html> ) {
          open( FH, "$_" ) or die;
          while( <FH> ) {
             if( $_ =~ /^.*?(Treffer \<b\>)(\d+)( - )(\d+)(<\/b> \w+ \w+ \<b\>)(\d+).*/ ) {
                $total_records = $6;
                print "Total records to process is $total_records
";
                }
             }
             close FH;
       }
       unlink 'tempfile.html';
    }

    sub processData() {
       while ( $range <= $total_records) {
          getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'processing.html') or die 'Unable to get page';
          $te->parse_file('processing.html');
          my ($table) = $te->tables;
          for my $row ( $table->rows ) {
             cleanup(@$row);
             print OUTFILE "@$row
";
          }
          $| = 1; 
          print "Processed records $range to $counter";
          print "\r";
          $counter = $counter + 50;
          $range = $range + 50;
          $te = HTML::TableExtract->new;
       }
    }

    sub cleanup() {
       for ( @_ ) {
          s/\s+/ /g;
       }
    }

    sub workDir() {
    # Use home directory to process data
    chdir or die "$!";
    if ( ! -d $processdir ) {
       mkdir ("$ENV{HOME}/$processdir", 0755) or die "Cannot make directory $processdir: $!";
       }
    }



i come back later today!!

greetings

Hi
Can you have a look at this data, does it look ok?


wget http://www.muppetwifi.homeunix.net/openSUSE/webdata_for_e_20110218080626.txt.tar.bz2

hello again

thx for answering so quick!!

First of all - many many thanks for the help!

I run OpenSuse 11.3.

Thanks for offering the new data on your server: i downloaded it, extracted it with Ark and finally opened it with kWrite: See the reslults

lfd. Nr. Schul- nummer Schulname Stra�e PLZ Ort Telefon Fax Schulart Webseite
1 0401 M�dchenrealschule Marienburg,�Abenberg, der Di�zese Eichst�tt Marienburg 1 91183� Abenberg�  09178/509210  Realschulen  mrs-marienburg.homepage.t-online.de 
2 6581 Volksschule Abenberg�(Grundschule) G�ss�belstr. 2 91183� Abenberg�  09178/215 09178/905060 Volksschulen  home.t-online.de/home/vs-abenberg 
3 6913 Mittelschule Abenberg� G�ss�belstr. 2 91183� Abenberg�  09178/215 09178/905060 Volksschulen  home.t-online.de/home/vs-abenberg 
4 0402 Johann-Turmair-Realschule�Staatliche Realschule Abensberg Stadionstra�e 46 93326� Abensberg�  09443/9143-0,12,13 09443/914330 Realschulen  [RS Abensberg - startseite](http://www.rs-abensberg.de) 
5 3041 Cabrini-Schule Offenstetten, Priv. F�rderzentrum�F�rderschwerp. geist.Entwickl. d. Kath.Jugendf�rs. Am Schmiedweiher 8 93326� Abensberg�Offenstetten 09443/9188-3 09443/918855 Volksschulen zur sonderp�dog. F�rderung  [Cabrinischule Offenstetten - Förderzentrum mit Förderschwerpunkt geistige Entwicklung](http://www.cabrinischule.de) 
6 3074 Private Berufsschule zur sonderp�d. F�rderung,�F�rderschwerpunkt Lernen, Abensberg Regensburger Stra�e 60 93326� Abensberg�  09443/709191 09443/709193 Berufsschulen zur sonderp�dog. F�rderung  [Home](http://www.berufsschule-abensberg.de) 
7 3083 Pr�lat-Michael-Thaller-Schule, Priv. Sonderp�d.�F�rderzetrum Abensberg der Kath. Jugendf. Regensb. Regensburger Str. 58 93326� Abensberg�  09443/928500 09443/92850300 Volksschulen zur sonderp�dog. F�rderung  [Prälat-Michael-Thaller-Schule](http://www.pmt-schule.de) 
8 3253 Priv.Fachschule f.Heilerziehungspflege u. -pflege-�hilfe Abensberg d.Kath.Jugendf�rsorge d.Di�z.Reg. An den Sandwellen 124 93326� Abensberg�  09443/709174 09443/709-222 Hauswirtschaftliche Fachschulen  www.bbw-abendberg.de 
9 3656 Aventinus-Volksschule Abensberg�(Grundschule) R�merstr. 2 93326� Abensberg�  09443/491 09443/6241 Volksschulen  [aventinus grundschule abensberg](http://www.aventinus-gs-abensberg.de) 
10 3657 Aventinus-Mittelschule Abensberg� R�merstra�e 12 93326� Abensberg�  09443/6439 09443/3440 Volksschulen  [Unsere Besucherzahlen](http://www.hs-abensberg.de) 


**some ideas / thoughts **, that come up to mind:

shouldn ´t we normalize everything to UTF-8 .

The data is encoded in UTF-8 - both on input and output. But the output text substitutes various two-byte characters with EF BF BD, which is UTF-8 for Unicode U+FFFD or ‘REPLACEMENT CHARACTER’. As long as we opened all files as UTF-8 all should be well.

What do you think? - do the charackters on your site look better - / other…?

perhaps i have a problem on my openSuse 11.3!? - possible!?

look forward to hear from you

db1

Hi
What is kwrite set to?

What do you get for the following command;


locale

The big issue is I can’t duplicate, it’s all OK for me when I process the data. I do however see the � in the text you post.

Maybe another user can confirm?

hello dear Malcolm

where [exactly] do i can see how KWrite is set up? I have had a quick view - into KWrite… but could not find some special place to see something…

btw: locale - i runned the command:
see the results here:

martin@suse-linux:~> locale
LANG=de_DE.UTF-8
LC_CTYPE=“de_DE.UTF-8”
LC_NUMERIC=“de_DE.UTF-8”
LC_TIME=“de_DE.UTF-8”
LC_COLLATE=“de_DE.UTF-8”
LC_MONETARY=“de_DE.UTF-8”
LC_MESSAGES=“de_DE.UTF-8”
LC_PAPER=“de_DE.UTF-8”
LC_NAME=“de_DE.UTF-8”
LC_ADDRESS=“de_DE.UTF-8”
LC_TELEPHONE=“de_DE.UTF-8”
LC_MEASUREMENT=“de_DE.UTF-8”
LC_IDENTIFICATION=“de_DE.UTF-8”
LC_ALL=
martin@suse-linux:~>

btw - i think the idea of confirmation by some third ones is a very very good one!

i try to find some here --…

many many
greetings. db1

Note - i come back later this evening!!! sure thing!! :slight_smile:

On Fri February 18 2011 02:36 pm, dilbertone wrote:

>
> hello dear Malcolm
>
> malcolmlewis;2291796 Wrote:
>> Hi
>> What is kwrite set to?
>>
<snip>
>> The big issue is I can’t duplicate, it’s all OK for me when I process
>> the data. I do however see the � in the text you post.
>>
>> Maybe another user can confirm?
>
<snip>
>
> btw - i think the idea of confirmation by some third ones is a very
> very good one!
>

I also see “&#65533


P. V.
“We’re all in this together, I’m pulling for you.” Red Green

Hi
Thanks :slight_smile:
So what about your locale command?
Can you download the file from a previous post untar and inspect?


Cheers Malcolm °¿° (Linux Counter #276890)
SUSE Linux Enterprise Desktop 11 (x86_64) Kernel 2.6.32.27-0.2-default
up 7 days 19:06, 4 users, load average: 1.67, 0.49, 0.23
GPU GeForce 8600 GTS Silent - Driver Version: 260.19.26

On Fri February 18 2011 05:19 pm, malcolmlewis wrote:

>

>
> [/QUOTE]
> Hi
> Thanks :slight_smile:
> So what about your locale command?
> Can you download the file from a previous post untar and inspect?
>
Malcolm;

My locale:


xxxxxx@Hal2:~> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

The first few lines of webdata_for_e_20110218080626.txt from kwrite


lfd. Nr. Schul- nummer Schulname Stra?e PLZ Ort Telefon Fax Schulart Webseite
1 0401 M?dchenrealschule Marienburg,?Abenberg, der Di?zese Eichst?tt
Marienburg 1 91183? Abenberg?  09178/509210  Realschulen
mrs-marienburg.homepage.t-online.de
2 6581 Volksschule Abenberg?(Grundschule) G?ss?belstr. 2 91183? Abenberg?
09178/215 09178/905060 Volksschulen  home.t-online.de/home/vs-abenberg
3 6913 Mittelschule Abenberg? G?ss?belstr. 2 91183? Abenberg?  09178/215
09178/905060 Volksschulen  home.t-online.de/home/vs-abenberg
4 0402 Johann-Turmair-Realschule?Staatliche Realschule Abensberg Stadionstra?e
46 93326? Abensberg?  09443/9143-0,12,13 09443/914330 Realschulen
www.rs-abensberg.de

This looks good.

P. V.
“We’re all in this together, I’m pulling for you.” Red Green

On Fri February 18 2011 06:06 pm, PV wrote:

> On Fri February 18 2011 05:19 pm, malcolmlewis wrote:
<snip>

Setting UTF-8 for Knode I have:



lfd. Nr. Schul- nummer Schulname Stra�e PLZ Ort Telefon Fax Schulart Webseite
1 0401 M�dchenrealschule Marienburg,�Abenberg, der Di�zese Eichst�tt
Marienburg 1 91183� Abenberg�  09178/509210  Realschulen
mrs-marienburg.homepage.t-online.de
2 6581 Volksschule Abenberg�(Grundschule) G�ss�belstr. 2 91183� Abenberg�
09178/215 09178/905060 Volksschulen  home.t-online.de/home/vs-abenberg
3 6913 Mittelschule Abenberg� G�ss�belstr. 2 91183� Abenberg�  09178/215
09178/905060 Volksschulen  home.t-online.de/home/vs-abenberg
4 0402 Johann-Turmair-Realschule�Staatliche Realschule Abensberg Stadionstra�e
46 93326� Abensberg�  09443/9143-0,12,13 09443/914330 Realschulen
www.rs-abensberg.de
5 3041 Cabrini-Schule Offenstetten, Priv. F�rderzentrum�F�rderschwerp.
geist.Entwickl. d. Kath.Jugendf�rs. Am Schmiedweiher 8 93326�
Abensberg�Offenstetten 09443/9188-3 09443/918855 Volksschulen zur
sonderp�dog. F�rderung  www.cabrinischule.de



P. V.
“We’re all in this together, I’m pulling for you.” Red Green

On Fri February 18 2011 06:23 pm, PV wrote:

> On Fri February 18 2011 06:06 pm, PV wrote:
>
>> On Fri February 18 2011 05:19 pm, malcolmlewis wrote:
> <snip>
>
malcolm;

The characters that are not rendering correctly for us are mainly the umlauted
vowels (ae,ue,oe) and scharfes (ss). However there are a few characters
around Abenberg that I think should be white spaces or maybe punctuation. I’m
guessing these are OK in de_DE.UTF-8 (German UTF-8), but I’m not sure how
those are coded.

P. V.
“We’re all in this together, I’m pulling for you.” Red Green

In gnome there’s a character map you can use to find either the code or the character. Usually under Common or Latin for me, but for you maybe German.

On Fri February 18 2011 10:06 pm, tararpharazon wrote:

I’m
>> guessing these are OK in de_DE.UTF-8 (German UTF-8), but I’m not sure
>> how
>> those are coded.
>> –
>> P. V.
>> “We’re all in this together, I’m pulling for you.” Red Green

> In gnome there’s a character map you can use to find either the code
> or the character. Usually under Common or Latin for me, but for you
> maybe German.
>
Setting the character set to iso 8859-1 does correctly show the information.
Not sure if it will display properly when posted but here it is. Looks good
at my end.


lfd. Nr. Schul- nummer Schulname Straße PLZ Ort Telefon Fax Schulart Webseite
1 0401 Mädchenrealschule Marienburg, Abenberg, der Diözese Eichstätt
Marienburg 1 91183  Abenberg   09178/509210  Realschulen
mrs-marienburg.homepage.t-online.de


P. V.
“We’re all in this together, I’m pulling for you.” Red Green

That looks perfect here :wink:


Cheers Malcolm °¿° (Linux Counter #276890)
SUSE Linux Enterprise Desktop 11 (x86_64) Kernel 2.6.32.27-0.2-default
up 8 days 0:50, 2 users, load average: 0.31, 0.24, 0.16
GPU GeForce 8600 GTS Silent - Driver Version: 260.19.26

hello Malcolm, venzkep and tararpharazon,

many many thanks for the engaged postings … i am happy about the discussion here. :wink:

lfd. Nr. Schul- nummer Schulname Straße PLZ Ort Telefon Fax Schulart
Webseite 1 0401 Mädchenrealschule Marienburg, Abenberg, der Diözese
Eichstätt Marienburg 1 91183 Abenberg 09178/509210 Realschulen
mrs-marienburg.homepage.t-online.de

That looks perfect here too! :wink:

It is great to follow your discussion (above) about characters and encoding is great.
Something (went) wrong with the decoding of the characters that are not rendering correctly (especially for mainly the umlauted vowels (ae,ue,oe) and scharfes (ss). )

And to tell the truth: i have overseen something…: Enlgihtning discussion here:

@ Malcom: i did get more information out of my KWrite was set to unicode utf-8
if i opened your files or files that was being created by your scripts with KWrite then i got an error -“corrupted utf-8-file opened)”

**The popup says: ** “the file …xyz. was opened as utf-8-file, but has got corrupted characters. This is a reading only version. Just use the correct character-set and continue.”

**After setting it to iso 8859-1 **all looks great… :wink:

lfd. Nr. Schul- nummer Schulname Straße PLZ Ort Telefon Fax Schulart Webseite
1 0401 Mädchenrealschule Marienburg, Abenberg, der Diözese Eichstätt Marienburg 1 91183 Abenberg 09178/509210 Realschulen mrs-marienburg.homepage.t-online.de
2 6581 Volksschule Abenberg (Grundschule) Güssübelstr. 2 91183 Abenberg 09178/215 09178/905060 Volksschulen News & E-Mail bei t-online.de | Politik, Sport, Unterhaltung & Ratgeber
3 6913 Mittelschule Abenberg Güssübelstr. 2 91183 Abenberg 09178/215 09178/905060 Volksschulen News & E-Mail bei t-online.de | Politik, Sport, Unterhaltung & Ratgeber

BTW: one last question regarding the parsing.… is there any chance to catch some seperators within the that seperate the table…?

Let us take a look at the table: ( KM-Bayern - Suche in der bayerischen Schuldatenbank ) Note - after all i want to store the data into a MySQL database. So it would be great to have some seperators - (commas, tabs or somewhat else - a tab seperated values or comma seperated values
are handy formats to work with…

( here the data out of the following site: KM-Bayern - Suche in der bayerischen Schuldatenbank )

** legend ** lfd. Nr. Schul- nummer Schulname Straße PLZ Ort Telefon Fax Schulart Webseite

1 0401 Mädchenrealschule Marienburg, Abenberg, der Diözese Eichstätt Marienburg 1 91183 Abenberg 09178/509210 Realschulen mrs-marienburg.homepage.t-online.de
**2 **6581 Volksschule Abenberg (Grundschule) Güssübelstr. 2 91183 Abenberg 09178/215 09178/905060 Volksschulen News & E-Mail bei t-online.de | Politik, Sport, Unterhaltung & Ratgeber
6 3074 Private Berufsschule zur sonderpäd. Förderung, Förderschwerpunkt Lernen, Abensberg Regensburger Straße 60 93326 Abensberg 09443/709191
09443/709193 Berufsschulen zur sonderpädog. Förderung Home

Well i need to have those lines divided into at least three columns - take the first record.

name: Volksschule Abenberg (Grundschule)
street: Güssübelstr. 2
postal-code and town: 91183 Abenberg
fax and telephone: 09178/215 09178/905060
type of school: Volksschulen
website: News & E-Mail bei t-online.de | Politik, Sport, Unterhaltung & Ratgeber

Or even better - i ** divide **the postal-code and town into two separate columns!? Question: is this possible?

By the way: see two certain records the first record and the sixth record of the mentioned page: (here i only show the names of the school)

1 0401 Mädchenrealschule Marienburg, Abenberg,
6 3074 Private Berufsschule zur sonderpäd. Förderung, Förderschwerpunkt Lernen, Abensberg

Those two records** have some commas inside the name;** does this make it difficult to create a parser that creates csv-fomate?

any idea how to do this in Perl… If possible it would be just great!!

many many thx for a hint regarding this little issue - besides this all is great and fascinating!

dilbertone…:wink:

Hi
Will look after the weekend for you (unless someone else jumps in) but I would look at using some character that’s not common maybe a * or : rather than a comma. I think I can do a grid reference as well, which may make it easier to split the fields required.

hello dear Malcom :wink:

great to hear from you!!! I am very very glad! :wink:

By the way - this is a great thread - it has a steep learning curve for me. I have gained alot bout utf-8 and iso 8859-1.

And of course some thing on handling data on linux.

But by far the most impressive thing i learned is that this community here is so supportive. I am overwhelmed by this experience. This forum has so many many
great folks.

Thx for all the great help! Thx at you Malcom and all the others…!!

Again - great to hear your ideas.

I am very very glad! - Well what about the usage of Text::CSV; i installed it and played around with this module…

But - unfortunatley i have some character-issues…again… The good thing is - it gives back the results in some csv-formate.
But the spider logic is not as nice as in your script. Can we combine and put this into yours…

What do you think ?!?

#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;
use HTML::TableExtract;
use Text::CSV;

my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50';
$html =~ tr/\r//d;     # strip the carriage returns
$html =~ s/ / /g; # expand the spaces

my $te = new HTML::TableExtract();
$te->parse($html);

my @cols = qw(
    rownum
    number
    name
    phone
    type
    website
);

my @fields = qw(
    rownum
    number
    name
    street
    postal
    town
    phone
    fax
    type
    website
);

my $csv = Text::CSV->new({ binary => 1 });

foreach my $ts ($te->table_states) {
    foreach my $row ($ts->rows) {

        #  trim leading/trailing whitespace from base fields
        s/^\s+//, s/\s+$// for @$row;

        # load the fields into the hash using a "hash slice"
        my %h;
        @h{@cols} = @$row;

        # derive some fields from base fields, again using a hash slice
        @h{qw/name street postal town/} = split /
+/, $h{name};
        @h{qw/phone fax/} = split /
+/, $h{phone};

        #  trim leading/trailing whitespace from derived fields
        s/^\s+//, s/\s+$// for @h{qw/name street postal town/};

        $csv->combine(@h{@fields});
        print $csv->string, "
";
    }
}


Malcom - what do you think about this … This part can do the CSV-formate. We should combine it with your spider-logic.

Where to add this… and to do the combination… Is this line the line where we have to look after - in your script… Where the = new HTML::TableExtract(); is called?!

I am very interested what you think about this …

greetings
dilbertone:)

btw : the code spits out the following:: (again we have the character-problem… dont we… :wink: I have copied and pasted this out of the command line!



"lfd. Nr.","Schul-
nummer",Schulname,Stra�e,PLZ,Ort,Telefon,Fax,Schulart,Webseite
1,0401,"M�dchenrealschule Marienburg, Abenberg, der Di�zese Eichst�tt","Marienburg 1",91183,Abenberg,09178/509210,,Realschulen,mrs-marienburg.homepage.t-online.de
2,6581,"Volksschule Abenberg (Grundschule)","G�ss�belstr. 2",91183,Abenberg,09178/215,09178/905060,Volksschulen,home.t-online.de/home/vs-abenberg
3,6913,"Mittelschule Abenberg","G�ss�belstr. 2",91183,Abenberg,09178/215,09178/905060,Volksschulen,home.t-online.de/home/vs-abenberg
4,0402,"Johann-Turmair-Realschule Staatliche Realschule Abensberg","Stadionstra�e 46",93326,Abensberg,"09443/9143-0,12,13",09443/914330,Realschulen,www.rs-abensberg.de


see the separation - nice - but see the characters - … what is this…- again the same thing…1`?

Hi
Looks good :slight_smile:

Have a look here at what options are available for TableExtract, my thought was using the gridmap?
HTML::TableExtract - search.cpan.org

This would eliminate the cvs part and just fire into a spreadsheet using the relevant perl module.

hello Malcom :slight_smile:

many thanks for the hint! It’s a real pleasure to be on this great forum …

This looks great - the various options are very very impressive - (some go over my lightweight and beginner-experience with perl). But i am prepared to learn…

**BTW ** At the moment, i try to figure out where to insert the above code - (the one with the Text::CSV-part) needs to go into your script in order to make the best of both!?

Question: can we ** identify this point to migrate** the one into the other… !? That would be amazing… I hope i could make clear what i have in mind…!?

Are we able to use the benefits of the both parts (/scripts ) migrating them into one? So the question is: where to set in with the CSV-Script into your very sophisticated script:

Question: is it here… ?!:


          $te = HTML::TableExtract->new;
       }
    }

    sub cleanup() {
       for ( @_ ) {
          s/s+/ /g;
       }
    }

    sub workDir() {
    # Use home directory to process data
    chdir or die "$!";
    if ( ! -d $processdir ) {
       mkdir ("$ENV{HOME}/$processdir", 0755) or die "Cannot make directory $processdir: $!";
       }
    }  

Malcom, I would be glad if you can help me here…

**the other way - the method b:**I am trying to realize the way to do the transfer out of the online-table into a spreadsheet using the perl module.

I come back later the weekend and report all my findings…

greetings
dilbertone :wink:

Hello dear Malcome,

it would be a great great pleasure if you can help me in migrating the both scripts…

i am just a novice and the Text::CSV-Script goes somewhat above my head. The same is yours… Can you help me in migrating both into one… So that i have one script that does the spidering and parsing - and besides this - also the separating into Comma-seperated values…

i would love if you can help me here…

many many thanks for all your help
Greetings
dilbertone:)