Results 1 to 6 of 6

Thread: cannot open a 2 mb-file that has the formate .csv - a file dilemma!!

  1. #1

    Default cannot open a 2 mb-file that has the formate .csv - a file dilemma!!

    good day dear linux-fans,

    i cannot open a 2 mb-file that has the formate .csv - a file dilemma!!


    what can i do now!?


    note i run a notebook with 4 GB RAM.
    dilbert ;-)
    Wordpress-development - a Toolset: wpgear.org

  2. #2
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    4,654

    Default Re: cannot open a 2 mb-file that has the formate .csv - a file dilemma!!

    What program do you use to open it?

    A simple text editor like kwrite or gedit should not have any problem to
    open such a small file.

    --
    PC: oS 11.4 x86_64 | Intel Core i7-2600@3.40GHz | 16GB | KDE 4.8.2 |
    GeForce GT 420
    Eee PC 1201n: oS 12.1 x86_64 | Intel Atom 330@1.60GHz | 3GB | KDE 4.8.2
    | nVidia ION
    eCAFE 800: oS 12.1 i586 | AMD Geode LX 800@500MHz | 512MB | KDE 3.5.10 |
    xf86-video-geode

  3. #3

    Default Re: cannot open a 2 mb-file that has the formate .csv - a file dilemma!!

    hello dear Martin good day,

    great to hear from you - it is very good since i have some questions regarding the handling of the xml-parser you have developed last week. Now some additional things to this xslt-processor that you have developed here. http://forums.opensuse.org/english/g...work-data.html

    you remember that we ve splittet the large file that was derived from here Index of /pub/misc/openstreetmap/download.geofabrik.de


    We ve splittet the file with XML_Split - (which was a great idea!! - thank you very much for this great suggestion!)

    One further thought, since the germany file is extremely large the xslt
    processing will use tremendous amount of memory, my PC with 16GB starts
    to swap with the one shot solution.
    I think some further preprocessing is needed to split the source xml
    file into smaller xml files or to remove useless parts before processing it.

    see here: http://forums.opensuse.org/english/g...rk-data-2.html

    There is a command available on openSUSE to split xml files named
    xml_split (it is part of the package perl-XML-Twig if you have not
    installed that install it).
    Try to run the following command (I hope you have enough hard disk space
    since the output is roughly 20GB).
    note i ve spilttet the file with xml_split and got back pretty much files.. Each one has got 96 MB...


    Code:
    linux-wyee:/home/martin/gis/german_poi # ls
    .directory       germany-020.xml  germany-041.xml  germany-062.xml  germany-083.xml  germany-104.xml  germany-125.xml  germany-146.xml  germany-167.xml  germany-188.xml  germany-209.xml
    germany-000.xml  germany-021.xml  germany-042.xml  germany-063.xml  germany-084.xml  germany-105.xml  germany-126.xml  germany-147.xml  germany-168.xml  germany-189.xml  germany-210.xml germany-001.xml  germany-022.xml  germany-043.xml  germany-064.xml  germany-085.xml  germany-106.xml  germany-127.xml  germany-148.xml  germany-169.xml  germany-190.xml  germany-211.xml germany-002.xml  germany-023.xml  germany-044.xml  germany-065.xml  germany-086.xml  germany-107.xml  germany-128.xml  germany-149.xml  germany-170.xml  germany-191.xml  germany-212.xml germany-003.xml  germany-024.xml  germany-045.xml  germany-066.xml  germany-087.xml  germany-108.xml  germany-129.xml  germany-150.xml  germany-171.xml  germany-192.xml  germany-213.xml germany-004.xml  germany-025.xml  germany-046.xml  germany-067.xml  germany-088.xml  germany-109.xml  germany-130.xml  germany-151.xml  germany-172.xml  germany-193.xml  germany-214.xml germany-005.xml  germany-026.xml  germany-047.xml  germany-068.xml  germany-089.xml  germany-110.xml  germany-131.xml  germany-152.xml  germany-173.xml  germany-194.xml  germany.osm.bz2 germany-006.xml  germany-027.xml  germany-048.xml  germany-069.xml  germany-090.xml  germany-111.xml  germany-132.xml  germany-153.xml  germany-174.xml  germany-195.xml  restaurants,csv germany-007.xml  germany-028.xml  germany-049.xml  germany-070.xml  germany-091.xml  germany-112.xml  germany-133.xml  germany-154.xml  germany-175.xml  germany-196.xml  restaurants-001.csv germany-008.xml  germany-029.xml  germany-050.xml  germany-071.xml  germany-092.xml  germany-113.xml  germany-134.xml  germany-155.xml  germany-176.xml  germany-197.xml  restaurants.xslt germany-009.xml  germany-030.xml  germany-051.xml  germany-072.xml  germany-093.xml  germany-114.xml  germany-135.xml  germany-156.xml  germany-177.xml  germany-198.xml  restaurants.xslt~ germany-010.xml  germany-031.xml  germany-052.xml  germany-073.xml  germany-094.xml  germany-115.xml  germany-136.xml  germany-157.xml  germany-178.xml  germany-199.xml  restaurants_2.xslt germany-011.xml  germany-032.xml  germany-053.xml  germany-074.xml  germany-095.xml  germany-116.xml  germany-137.xml  germany-158.xml  germany-179.xml  germany-200.xml germany-012.xml  germany-033.xml  germany-054.xml  germany-075.xml  germany-096.xml  germany-117.xml  germany-138.xml  germany-159.xml  germany-180.xml  germany-201.xml germany-013.xml  germany-034.xml  germany-055.xml  germany-076.xml  germany-097.xml  germany-118.xml  germany-139.xml  germany-160.xml  germany-181.xml  germany-202.xml germany-014.xml  germany-035.xml  germany-056.xml  germany-077.xml  germany-098.xml  germany-119.xml  germany-140.xml  germany-161.xml  germany-182.xml  germany-203.xml  germany-015.xml  germany-036.xml  germany-057.xml  germany-078.xml  germany-099.xml  germany-120.xml  germany-141.xml  germany-162.xml  germany-183.xml  germany-204.xml germany-016.xml  germany-037.xml  germany-058.xml  germany-079.xml  germany-100.xml  germany-121.xml  germany-142.xml  germany-163.xml  germany-184.xml  germany-205.xml germany-017.xml  germany-038.xml  germany-059.xml  germany-080.xml  germany-101.xml  germany-122.xml  germany-143.xml  germany-164.xml  germany-185.xml  germany-206.xml germany-018.xml  germany-039.xml  germany-060.xml  germany-081.xml  germany-102.xml  germany-123.xml  germany-144.xml  germany-165.xml  germany-186.xml  germany-207.xml germany-019.xml  germany-040.xml  germany-061.xml  germany-082.xml  germany-103.xml  germany-124.xml  germany-145.xml  germany-166.xml  germany-187.xml  germany-208.xml
    linux-wyee:/home/martin/gis/german_poi #

    and as mentioned above - the xslt-file has been overhauled... now we have.. the following result:


    Code:
    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
        <xsl:output method="text" encoding="utf-8" />
    
        <xsl:template match="/">
            <xsl:apply-templates />
        </xsl:template>
    
        <xsl:template match="osm">
            <xsl:apply-templates />
        </xsl:template>
    
        <xsl:template match="node[tag[@k='amenity' and @v='restaurant']]">
            <xsl:value-of select="./@id"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:value-of select="./@lat"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:value-of select="./@lon"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:value-of select="./tag[@k = 'name']/@v"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:value-of select="./tag[@k = 'wheelchair']/@v"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:value-of select="./tag[@k = 'website']/@v"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:value-of select="./tag[@k = 'addr:country']/@v"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:value-of select="./tag[@k = 'addr:street']/@v"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:value-of select="./tag[@k = 'addr:city']/@v"/>
            <xsl:text>&#x09;</xsl:text>
            <xsl:value-of select="./tag[@k = 'addr:housenumber']/@v"/>
            <xsl:text>&#x0A;</xsl:text>
        </xsl:template>
    
        <!-- all non-restaurant nodes -->
        <xsl:template match="node[tag[@k='amenity' and @v!='restaurant']]" />
    
    </xsl:stylesheet>
    you see that more tags are covered now.. Subsequently we get a enlarged database as a result of the parsing-process .

    but i think that i have a problem if i run the script for all the files and get the results in a single file.. nonetheless - i have to admitt that it would be very nice to run it one time and get the results in a single file ... Martin:, what would you suggest here? Would you mind to get the results in a single file or which way would you go on processing the xslt-parser? Handling such processes is a challenge and goes somewhat over my head...

    Any pointer would be greatly appreciated...


    greetings
    dilbert


    BTW:
    Last week we had this solution:
    Code:
    
    <xsl:stylesheet version = '1.0'
            xmlns="http://www.w3.org/1999/xhtml"
            xmlns:xml_split="http://xmltwig.com/xml_split"
            xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
    
        <xsl:output method="text" encoding="UTF-8"/>
        <xsl:template match="/">
    
                <xsl:for-each select="xml_split:root/node/tag[@k='amenity' and @v='restaurant']">
                <xsl:value-of select="../@id"/>
                <xsl:text>&#x09;</xsl:text>
                <xsl:value-of select="../@lat"/>
                <xsl:text>&#x09;</xsl:text>
                <xsl:value-of select="../@lon"/>
                <xsl:text>&#x09;</xsl:text>
                <xsl:for-each select="../tag[@k='name']">
                    <xsl:value-of select="@v"/>
                </xsl:for-each>
                <xsl:text>&#x0A;</xsl:text>
            </xsl:for-each>
        </xsl:template>
    
    </xsl:stylesheet>
    with the changes - that - were made - compared to the original-parser:


    Code:
    xmlns:xml_split="http://xmltwig.com/xml_split"
    and this one:
    Code:
        <xsl:for-each select="xml_split:root/node/tag[@k='amenity' and @v='restaurant']">
    Well - nothing has changed - the essential thing is - the output - (the result) -file is quite much larger. How would you organize the procssing - and the reusults as a csv. this is probably tooo big for one csv-spreadsheet!?..
    dilbert ;-)
    Wordpress-development - a Toolset: wpgear.org

  4. #4
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    4,654

    Default Re: cannot open a 2 mb-file that has the formate .csv - a file dilemma!!

    First question is, beside all details, with which tools do you want to
    process the output (the csv file)?

    Do you plan to load that as a spreadsheet into an office suite or do you
    want to put it in a database or some scripting solution?

    If you want to use it as a spreadsheet you may run into trouble with to
    many rows in the result and you probably have to keep it splitted.

    To get everything in one file you can just use a for loop in the shell

    Code:
    rm restaurants.csv #to avoid appending to an existing file
    for x in germany-???.xml
    do
    xsltproc restaurants.xslt $x >> restaurants.csv
    done
    --
    PC: oS 11.4 x86_64 | Intel Core i7-2600@3.40GHz | 16GB | KDE 4.8.2 |
    GeForce GT 420
    Eee PC 1201n: oS 12.1 x86_64 | Intel Atom 330@1.60GHz | 3GB | KDE 4.8.2
    | nVidia ION
    eCAFE 800: oS 12.1 i586 | AMD Geode LX 800@500MHz | 512MB | KDE 3.5.10 |
    xf86-video-geode

  5. #5

    Default Re: cannot open a 2 mb-file that has the formate .csv - a file dilemma!!

    hello Martin thx for the reply.

    i want to put the output into a MySQL-database. This is the most elegant solution. A cvs-file does not fit here. i think that the MySQL-Solution is the very best solution.

    i will give the loop-approach - you mentioned above - a try later tonigth - i come back on Sunday.

    greetings ;-)
    dilbert ;-)
    Wordpress-development - a Toolset: wpgear.org

  6. #6
    Join Date
    Jun 2008
    Location
    West Yorkshire, UK
    Posts
    3,450

    Default Re: cannot open a 2 mb-file that has the formate .csv - a file dilemma!!

    To load a CSV file into a mysql table, you need to replace all emply fields with \N; you also need to ceate the table structure before you import the file. Once you have done that, the command for a Linux file is

    LOAD DATA INFILE 'Filename' INTO TABLE table FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';

    In other words, you probably need to use LibreOffice or Kate to add the \Ns.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •