Postgis versus osmconvert - retrieval methods for German files

hello good morning,

what should i use:

PostGis as a solution for getting a database populated by osm2pgsql or osmconvert / -filter to create csv-sheets

aimed is to transform data out of German osm-pbf-files - in order to get the data (not to creat maps again)

see the source (s)= http://download.geofabrik.de/europe/germany.html ranging form 10 MB (Bremen) to 390 MB (Nordrhein Westfalen) the osm.pbf-files are not too big; Question: which method is the best - the most appropiate? - to store the results in a mysql-db or just have big calc-sheets (with csv-data)

  • only straightforward from files that
  • no backimport of data to OSM from a .csv file

i want to gather data from .o5m & pbf file - several ways are possible: what is wanted: i look for restaurants - with all of the following tags:


@id @lon @lat amenity name adress [as town, street, housnumber] website and mailadress. 

All interests are based on the regions of Germany see http://download.geofabrik.de/europe/germany.html i have several files i want to work with - ranging form 10 MB (Bremen) to 390 MB (Nordrhein Westfalen) the osm.pbf-files are not too big;

so the main question: which method is the best and the most appropiate? - to store the results in a mysql-db or just have big calc-sheets (with csv-data)

Well what i want to accomplish can be done with osmfilter/osmconvert too. So i guess that for my OSM purposes it is easier to filter and manipulate data in simple CSV format … because there is no need to learn complex database queries.

For example:


$ ./osmfilter bremen.o5m --keep="addr:country= and addr:city= and addr:street=" --ignore-dependencies | ./osmconvert - --csv="@oname @id @lon @lat addr:country addr:city addr:street"

we can do alot with osmconvert and osmfilter. :: using osmconvert to csv file… eg: . see the many sites that i get when searching for csv …they show that we can do lots of things with osmconvert… and filter. dont they!?

but on the other handsite:

since i can do lots of things with overpass-api i wondder how good (and easy) i can transforme the data to the mysql-db.:

see the overpass-api request:


query type="node">
  <has-kv k="place" v="city"/>
  <has-kv k="name" v="any town "/>
</query>
<query type="node">
  <around radius="1000"/>
    <has-kv k="shop"/>
</query>
<print/>

and the outconme;


osm version="0.6" generator="Overpass API">
<note>
The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.
</note>
<meta osm_base="2013-03-07T14:54:02Z" areas="2013-03-07T10:37:02Z"/>
<node id="240486180" lat="50.9744274" lon="3.0152858">
<tag k="addr:housenumber" v="9"/>
<tag k="addr:street" v="Marktplaats"/>
<tag k="amenity" v="cafe"/>
<tag k="email" v="vandaelekoen67@skynet.be"/>
<tag k="name" v="Paviljoentje"/>
<tag k="opening_hours" v="Mo-Su 09:00+; Tu off; Th 09:00-14:00"/>
<tag k="phone" v="+3251636211"/>
<tag k="website" v="http://www.paviljoentjestaden.be"/>
</node>
<node id="244312208" lat="51.2461401" lon="5.4390455">
<tag k="amenity" v="cafe"/>
<tag k="created_by" v="JOSM"/>
<tag k="name" v="De Club"/>
</node>

so - well what would you say…

i think i can reach the most of my desired goals with osmconvert and -osmfilter. Especially if i focus on the the regional germany dataset.
Note; All interests are based on the regions of Germany see http://download.geofabrik.de/europe/germany.html i have several
files i want to work with - ranging form 10 MB (Bremen) to 390 MB (Nordrhein Westfalen) the osm.pbf-files are not too big;

what do you advice - !? i guess that i can go with osmconvert and -filter to get csv-files. Since the files that result out of the GERMAN regional db are not toooo big!? What do you say!?

look foward to hear from you

As I’ve described in other related threads you’ve created, I use other tools instead of the specific ones you use.
So, putting aside the specific features of each approach you’re asking about, I’d probably answer your question in a more strategic sense…

Your approach recalls the mistakes done in the past especially during the dotcom boom in the late 1990’s when solutions were built by technicians, because the people with the money usually didn’t understand technology. They just threw money at the technicians and said “Here, build something that’s worthwhile” without detailing exactly what they wanted. Of course, without clear objectives, solutions were built according to the technician’s guesses and some were successful to varying degrees and others weren’t.

Since you haven’t identified exactly what your final goals are, like many before you the questions you’re asking can’t be answered well because you haven’t identified the path to where you’re going.

Are you building a search engine?
Are you building an analytical engine?
Is your data static or dynamically changing or growing?
How will data be presented to the End User?
etc.

Answers to those questions will suggest how your data should be stored so that you can build effective and efficient queries, and the tools you might need to deliver results each step of the way.

HTH,
TSU

hi tsu

thx for replying

want to have a way to do smart search and retrieval of openstreetmap data . which is dinamically chainging and - of couse !!! growing day by day. so i need a clever way to do updats of the db.

what bout Mysql or using postgis!?

love to hear from you

**note: **see the source (s)= Geofabrik Download Server ranging form 10 MB (Bremen) to 390 MB (Nordrhein Westfalen) the osm.pbf-files are not too big; Question: which method is the best - the most appropiate? - to store the results in a mysql-db or just have big calc-sheets (with csv-data)

  • only straightforward from files that
  • no backimport of data to OSM from a .csv file

I haven’t read about any solutions using MySQL, but I’ve read some solutions based on PostgeSQL.

But, in general leading edge solutions nowadays do not use either, or for that matter any kind of RDBMS.
Although RDBMS have established a long record of reliable support for LOB apps, they are fundamentally unsuited for data and solutions which are likely going to change over time. It’s well recognized that how you define data today may not be how will be using and looking at data in the future… Data formats may change, you may want to execute different types of queries, you may want to augment your data with new sources, etc.

That is why everyone is building using NoSQL.
MongoDB is very popular, especially as an introductory technology. MongoDB also has some pre-built connectors that ease import/export data from/to RDBMS although the others are getting pretty good at this, too.
Hadoop/Solr/Hive/Pig is the general standard, but requires a very steep learning curve. As “the” standard, you’ll also find plenty of platform options like AWS.
The Elasticsearch/logstash stack I’ve mentioned is an alternative to the Hadoop stack which attempts to re-build the entire stack on common standards (particularly use of JSON)
As I mentioned, PostgreSQL today can be implemented to “sort of” support some NoSQL type features, but because PostgreSQL is fundamentally a RDBMS, those are more workarounds than proper solutions.
I should include the fact that many RDBMS do support structures which are not restricted by the schema, the ones I know about typically create a revised meta-schema. Examples of this are “cubing” data using OLAP (Online Analytical Processing) or something close to it.

Any of the above (and there may be more choices) might be sufficient for your requirements.
The reason why NoSQL works so well for this type of data is that unlike RDBMS which requires a schema to be built <before> you insert data which means that the schema cannot be modified, NoSQL implements no tables and indexes as a metadata layer that can be modified sometimes even during runtime.

In order to avoid mis-steps, you will need to do plenty of reading to identify the solution that best suits your objectives. Even what I described only is a first step or two and nowhere close to identifying a complete solution.

HTH,
TSU

Hello dear TSU

all i want is to get POI out of the OSM-File - and to hold them on a dayly updated version:

in other words - i want to keep them updqated - day by day…

well i guess that i can do that with osmfilter and osmconvert - but there are probably better ways to do it:

  • PostGis as a solution for getting a database populated by osm2pgsql or osmconvert / -filter to create csv-sheets: probably PostGIS is tooo big and the learing curve is to steep; Note;: i only want to have POIS from the OSM-File. Nothing more.
  • MySQL seems to be better for what i am having in mind - what do you think??

wHAT IS AIMED: aimed is to transform data out of German osm-pbf-files - in order to get the data (not to creat maps again)

see the source (s)= http://download.geofabrik.de/europe/germany.html ranging form 10 MB (Bremen) to 390 MB (Nordrhein Westfalen) the osm.pbf-files are not too big; Question: which method is the best - the most appropiate? - to store the results in a mysql-db or just have big calc-sheets (with csv-data)

  • only straightforward from files that
  • no backimport of data to OSM from a .csv file

LOVE TO HEAR FROM YOU

GREETINGS

One of the unique qualities of OpenStreetMap is its continuous updating by all those thousands of contributors around the globe. Ideally you would want to reflect that in your POI extract. My question is:

**
What is the workflow for keeping an up-to-date OSM-based POI database that performs well?**

To make this a little more concrete, here’s what I currently do:

but i assume that the database grows because the --rri task replicates all changes and not just the POIs I’m interested in. So derived questions are:

  • Is there a way to filter change streams in osmosis before writing them to an output stream?
  • Is the workflow described a good way to approach this challenge?

Yeah,
I don’t know the all the specific reasons you’re running initialization tasks… poss?

  • You want to replicate data from your new data to your main repository
  • You are running a database server farm for whatever reason

So, here are some other descriptions of what I’m working with so you can compare with what you are currently doing…

I’m just glad with the stack I’m using (Elasticsearch which is a hadoop-style no-sql database) that I’ve left all that behind. Standard Operating Practice for importing data for the software I’m using is to simply push data to a port, nothing else. If the data has the same structure as existing data, then the data is appended or overwritten. If the new data has a different structure, then it’s just appended and integrated.

Also, typical of most no-sql databases, my solution doesn’t have a theoretical max capacity. If needs exceed existing machines, I just add another physical machine (node) and the data re-arranges itself. And, if fault tolerance is an objective, I can specify how the data is distributed or mirrored across the cluster.

The software also supports a plugin architecture, and numerous plugins have already be written and easily available to support both inputs and outputs to <each and every> individual application in the stack so I usually don’t have to wonder too long how I’m going to get data from one place to another both within and working with other widely used software.

There is also flexibility in setting up the “flow” of the software pieces, eg I can insert instances of apps like Redis or memcache to queue data so that if some parts of the flow are congested it won’t affect the entire flow. This is especially important if you might be importing data from multiple sources or parts of the solution might require more resources (eg indexing, data transformation). Although queuing is generally nice to have, it becomes more important if you’re streaming inputs

An important part of any solution I haven’t seen you describe is the User interface. Are you going to simply return GIS co-ordinates or are you mapping? How complex might your queries be, eg are you querying for multiple points and need to calculate distances, boundaries and shapes? This is one of the most important parts of the solution to decide, and often these tools will only work with certain types of inputs. Some will work with shapefiles, others won’t, eg Google Earth doesn’t support directly but an intermediate “translation” method is documented.

As for your question “MySQL or PostgreSQL” I don’t think that is too important. The SQL languages for both are similar enough to work with the other if necessary. PostgreSQL is often preferred due to features MySQL doesn’t have. Both will suffer from issues related to requiring data schemas.

Think strategically based on needs and solutions, not on specific technical pieces. It’s the difference between a coder and an experienced project manager. Only when you’ve detailed the overall solution should you then focus on the specific software to be used and only after that then the exact code.

HTH,
TSU

hello dear TSU

well many many thanks for the answer and all your hints

i do not need to create any maps. all i want to do is to fetch POIs from the planet file
note; main interest is in GERMAN files - ranging from 10 MB to 400 MB - see:

http://download.geofabrik.de/europe/germany.html
see the Files from approx 10 MB (Bremen) to 390 MB (Nordrhein Westfalen) - these small osm.pbf-files are the sources i am interested in;

see a project with some ideas: A self-updating OpenStreetMap database of US bridges – a step-by-step guide.

http://oegeo.wordpress.com/2012/03/06/a-self-updating-openstreetmap-database-of-us-bridges-a-step-by-step-guide/

note;: this guy looks for the bridges in the US

There are about 125,000 of them – for now loosely defined as ‘ways that have the ‘bridge’ tag‘.
So on the scale of OpenStreetMap data it’s a really small subset. In terms of the tools and processes needed,
the task seems easy enough, and as long as you are satisfied with a one-off solution, it really is. You would need only four things:

A planet file
A boundary polygon for the United States
A PostGIS database loaded with the osmosis snapshot schema and the linestring extension
osmosis, the OpenStreetMap ETL swiss army tool.

That, and a single well-placed osmosis command:… …] end of cit… sourcehttp://oegeo.wordpress.com/2012/03/06/a-self-updating-openstreetmap-database-of-us-bridges-a-step-by-step-guide/

the guy: I want all bridges in the US that are mapped in OpenStreetMap
in a PostGIS database that stays as up-to-date as possible, reflecting all the latest changes."

well - the guy looks for bridges in the US

i look for some search of Pois that are stored in the files http://download.geofabrik.de/europe/germany.html
see the Files from approx 10 MB (Bremen) to 390 MB (Nordrhein Westfalen)

i only want to have the data - in a text based manner - in the postgresql-db
to do some clever / smart retrieval

i hope now the project is a bit more clear… just ask if not!

thanks alot for all your ideas & help.

TSU i love to hear from you

greetings.

dilbertone

b t w: note - currently i am musing about the setup of postgreSQL-db (seee the other threaed)
https://forums.opensuse.org/showthread.php/492499-installing-postgreSQL-on-13-1

btw: as i mentioned the xml-formate in the threadstart: well i want to use perl for text-mangling - so we can use the XML::Simple module. here’s an example of a little script to parse xml:

see the Code:


 
#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;

my $xmlfile = shift || die "Usage: $0 <XML_FILE>
";

my $ref;
eval {
  $ref = XMLin($xmlfile,
    ForceArray    => 0,
    KeyAttr       =>  ],
    SuppressEmpty => '',
  ) or die "Can't read XML from $xmlfile: $!
";

};
die $@ if($@);
print Dumper $ref;



we can do good things



<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base=":49:02Z"/>

  <node id="297467" lat="49.5014" lon="8.1465">
    <tag k="addr:city" v="Stuttgart"/>
    <tag k="addr:housenumber" v="23"/>
    <tag k="addr:postcode" v="69115"/>
    <tag k="addr:street" v="Sofienstraße"/>
    <tag k="name" v="ARLT"/>
    <tag k="phone" v="+49 6221 20229"/>
    <tag k="shop" v="computer"/>
    <tag k="source" v="survey"/>
    <tag k="website" v="http://www.arlt.com"/>
    <tag k="wheelchair" v="yes"/>
  </node>
  <node id="305144906" lat="49.40012" lon="8.6929652">
    <tag k="addr:city" v="Mainz"/>
    <tag k="addr:country" v="DE"/>
    <tag k="addr:housenumber" v="13-15"/>
    <tag k="addr:postcode" v="69115"/>
    <tag k="addr:state" v="Baden-Württemberg"/>
    <tag k="addr:street" v="Rohrbacher Straße"/>
    <tag k="name" v="Heidel-bike"/>
    <tag k="opening_hours" v="Tu-Fr 10:00-18:30; Sa 10:00-14:00"/>
    <tag k="shop" v="bicycle"/>
    <tag k="website" v="http://www.heidelbike.de/"/>
    <tag k="wheelchair" v="yes"/>
  </node>



i runned this - and it looks nice… Do you think that this is an appropiate way to do this?

another suggestion to solve this:** XML::Twig**

I’m not an XML expert - but i think that we can do this too: I put the modified data in a file. We
can see the content after the END token in the script:

use strict;
use warnings;
use XML::Twig;

my $t= XML::Twig->new( pretty_print => 'indented',
                    twig_handlers => {

                          'node'=>sub{print $_[1]->att('id'),', ',$_[1]->att('lat'),', ',$_[1]->att('lon'),', ' ;

                                      foreach my $tag ( $_[1]->children ){
                                              print $tag->att('v').", ";
                                      }
                                      print "
";
                          },

                    }
);


see this

print "#ID, LAT, LON, CITY, HOUSNUMBER, POSTCODE, STREET, NAME, PHONE, SHOP, SOURCE, WEBSITE, WHEELCHAIR
";
$t->parsefile('xml-001.xml');

__END__

see the data


<?xml version="1.0" encoding="UTF-8"?>
<root>
<node id="297467" lat="49.5014" lon="8.1465">
    <tag k="addr:city" v="Stuttgart"/>
    <tag k="addr:housenumber" v="23"/>
    <tag k="addr:postcode" v="69115"/>
    <tag k="addr:street" v="Sofienstrae"/>
    <tag k="name" v="ARLT"/>
    <tag k="phone" v="+49 6221 20229"/>
    <tag k="shop" v="computer"/>
    <tag k="source" v="survey"/>
    <tag k="website" v="http://www.arlt.com"/>
    <tag k="wheelchair" v="yes"/>
</node>
<node id="305144906" lat="49.40012" lon="8.6929652">
<tag k="addr:city" v="Mainz"/>
<tag k="addr:country" v="DE"/>
<tag k="addr:housenumber" v="13-15"/>
<tag k="addr:postcode" v="69115"/>
<tag k="addr:state" v="Baden-WUrttemberg"/>
<tag k="addr:street" v="Rohrbacher StraSSe"/>
<tag k="name" v="Heidel-bike"/>
<tag k="opening_hours" v="Tu-Fr 10:00-18:30; Sa 10:00-14:00"/>
<tag k="shop" v="bicycle"/>
<tag k="website" v="http://www.heidelbike.de/"/>
<tag k="wheelchair" v="yes"/>
</node>
</root>

__OUTPUT__
#ID, LAT, LON, CITY, HOUSNUMBER, POSTCODE, STREET, NAME, PHONE, SHOP, SOURCE, WEBSITE, WHEELCHAIR
297467, 49.5014, 8.1465, Stuttgart, 23, 69115, Sofienstrae, ARLT, +49 6221 20229, computer, survey, http://www.arlt.com,
 yes,
305144906, 49.40012, 8.6929652, Mainz, DE, 13-15, 69115, Baden-WUrttemberg, Rohrbacher StraSSe, Heidel-bike, Tu-Fr 10:00
-18:30; Sa 10:00-14:00, bicycle, http://www.heidelbike.de/, yes,

well this can be done with some neat perl-modules…

TSU i love to hear from you

greetings.

hello dear TSU

well i have another approach:

see the following ideas: The data i have it is derived from the planet-file - cf. openstreetmap -
after receiving the data subsequently i d like to pass it over to the mysql db.

note- can i use the allready installed and activated a mysql-database see the code - the perl code that uses DBI


 
#!/usr/bin/perl  
use strict ; 
use DBI; 
use XML::Twig; 
 
# prepare database 
my $dbh=dbh(); # connect 
init(); 
$dbh->do('USE db123'); 
#$dbh->do('DELETE FROM pois'); 
 
# sql 
my $sql = 'REPLACE INTO pois VALUES (?,?,?,?,?,?)'; 
my $sth = $dbh->prepare($sql); 
 
# set up handler 
my $t = XML::Twig->new(  
  twig_handlers => { 'node' => \&node } 
); 
 
# parse xml 
my $xml = do { local $/; <DATA> }; 
$t->parse($xml); 
#$t->parsefile('.osm'); 
 
sub node { 
  my ($t,$elt) = @_; 
   
  my %data=( 
   'id'  => $elt->att('id'), 
   'lat' => $elt->att('lat'), 
   'lon' => $elt->att('lon'), 
   ); 
  for my $tag ( $elt->children() ){ 
    $data{$tag->att('k')} = $tag->att('v'); 
    #print $tag->att('k').' = '.$tag->att('v')."
"; 
  } 
  
  # update database 
  my @f = map{ $data{$_} }('id','lat','lon','name','amenity','operator'); 
  if ($f[3] ne '' && $f[4] ne '' && $f[5] ne ''){ 
    print "-- INSERT --
". 
    (join "
",@f). 
    "

"; 
    $sth->execute(@f); 
  } 
} 
 
 
sub init { 
  $dbh-> do('CREATE DATABASE IF NOT EXISTS db123  
             DEFAULT CHARACTER SET latin1  
             COLLATE latin1_german2_ci'); 
  $dbh->do('USE db123'); 
  $dbh->do('CREATE TABLE IF NOT EXISTS pois ( 
           id       BIGINT(20) UNSIGNED NOT NULL, 
           lat      FLOAT(10,7) NOT NULL, 
           lon      FLOAT(10,7) NOT NULL, 
           name     VARCHAR(255) COLLATE utf8_bin NOT NULL, 
           amenity  VARCHAR(255) COLLATE utf8_bin NOT NULL, 
           operator VARCHAR(255) COLLATE utf8_bin NOT NULL, 
           PRIMARY KEY  (id) 
          ) ENGINE=MyISAM DEFAULT  
            CHARSET=utf8  
            COLLATE=utf8_bin'); 
} 
 
sub dbh { 
  my $dsn = "DBI:mysql:database=;host=localhost"; 
  my $dbh = DBI->connect($dsn, 'user', 'pwd', 
            {RaiseError => 1, PrintError => 1})  
            or die (Error connecting " $DBI::errstr"); 
}




see the xml-file that comes out of a osm-parser - it is formatted in xml: which method fits if i want to store the output of this request to a mysql database the dataset i have gathered from this site: http://overpass-turbo.eu

see the output here - the xml file that has the data that i want to store in the mysql database.



<node id="2064639440" lat="49.4873181" lon="8.4710548"> 
    <tag k="amenity" v="restaurant"/> 
    <tag k="cuisine" v="turkish"/> 
    <tag k="email" v="info@lynso.de"/> 
    <tag k="name" v="Kilim  - Café und Bar Restaurant"/> 
    <tag k="opening_hours" v="Su-Th 17:00-1:00; Fr, Sa 17:00-3:00"/> 
    <tag k="operator" v="Cengiz Kaya"/> 
    <tag k="phone" v="06 21 - 43 755 371"/> 
    <tag k="website" v="http://www.kilim-mannheim.de/"/> 
  </node> 
  <node id="2126473801" lat="49.4851170" lon="8.4756295"> 
    <tag k="amenity" v="restaurant"/> 
    <tag k="cuisine" v="italian"/> 
    <tag k="email" v="mannheim1@vapiano.de"/> 
    <tag k="fax" v="+49 621 1259 779"/> 
    <tag k="name" v="Vapiano"/> 
    <tag k="opening_hours" v="Su-Th 10:00-24:00; Fr-Sa 10:00-01:00"/> 
    <tag k="operator" v="Vapiano"/> 
    <tag k="phone" v="+49 621 1259 777"/> 
    <tag k="website" v="http://www.vapiano.de/newsroom/?store=29"/> 
    <tag k="wheelchair" v="yes"/> 
  </node>



well - questions are the followings

  • how to handle the xml-file.
  • can i make usage of the allready installed mysql - db
  • how to make usage of the xml-file. - note the xml-file is very very big file.

btw - what bout the fields of the db

[FONT=Arial]||
|—|

update database

my @f = map{ $data{$_} }(‘id’,‘lat’,‘lon’,‘name’,‘amenity’,‘operator’)

[HR][/HR]

so - in other words. if i need more fields in the database - for the POI (that is the
dataset that i get from the planet-file) then i extend the above mentioned code-line!?

love to hear from you [/FONT]
:wink: