We give an overview which components constitute their existence in files, which processes and by which logging messages their state could be controlled. The most important component is the core. It is subdivided into the backend, the frontend, and the dispatcher. Then we discuss the additional modules: the meta data component, the XAPI compatibility layer and the area generator.
The backend is responsible for populating the database and keeping it up to date. The backend doesn't depend on the other components. However, if the tools are called without a running dispatcher, they explicitly need to know the location of the database directory with a parameter --db-dir.
The first task is to fetch data and diff files from the OSM main API (or possibly another source) and to store it locally. It is the administrators task to initially download start data, because a download of 30 GB or more should not be triggered without user intervention.
By contrast, the much smaller diffs (typically some 100 KB per minute) are downloaded by the process fetch_osc.sh and stored to the directory for minute diffs. It replicates the structure of the source, it makes subdirectories 000 to 999 each of which contains again contains subdirectories 000 to 999 which again contain files 000.osc.gz to 999.osc.gz and 000.state.txt to 999.state.txt. The files are not needed once their content has been patched into the database, but are nonetheless permanently stored. If a file hasn't been downloaded successfully by fetch_osc.sh, it is automatically redownloaded. fetch_osc.sh logs its activities into fetch_osc.log in the minute replicate diffs file.
The additional files in this directory come from the various extensions. No direct logging is performed. The progress can only be read off the console output, usually the content of the file nohup.out. Once the process has completed its job, it writes Update complete. and terminates.
The third task is to update the database with the minutely diff files. For this purpose, the daemon apply_osc_to_db.sh runs permanently and starts update_from_dir when one or more diffs are to be processed. Before update_from_dir is started, apply_osc_to_db.sh decompresses all diff files in a temporary directory that is deleted after update_from_dir has completed. It changes the same files in the database like update_database. It logs its current state into apply_osc_to_db.log and it writes the current replicate id in the file replicate_id and the timestamp of the currently processed OSM main data version into the file osm_base_version.
The dispatcher coordinates backend and frontend. Thus, there is no data associated with the dispatcher, but several overhead files, the process dispatcher, and the logfile transactions.log.
Dispatcher organizes its communication with the backend and frontend by three different means: Files ending in .lock are managed by the dispatcher and contain a pid. Having these as physical files ensures that after a crash the data suffices to recover correctly. The second way of communication is shared memory, in particular the shared memory file /osm3s_v0.6.98_osm_base (In Linux, you can access this file via /dev/shm/osm3s_v0.6.98_osm_base). Dispatcher announces in this directory the database location, and the frontend and backend can report their status here and request locks. The third way of communication is the Unix domain socket osm3s_v0.6.98_osm_base in the database directory. Each reading or writing process holds during its whole lifetime a permanent connection as a client to the dispatcher acting as server. Thus, if a process is killed, the dispatcher can notice this because the connection is suddenly lost.
The other overhead files are a file ending in .shadow for each file in the database ending in .bin, .map, or .idx. These files enable transactionality: all index information for the writing process is kept in the various .shadow files, and the plain .idx files are at the same time valid and used for all processes that request a read lock during the write operation. If the system crashes, one could simply discard the .shadow files, and the remaining database is still consistent with the original .idx files.
As the coordination of multiple processes is a troublesome task, the log file transactions.log is very verbose. Every message starts with the date in UTC and the PID of the process which has emitted this message.
The dispatcher itself tells all the messages it has accepted. In particular these are:
In addition, the dispatcher logs its own activities with purge and Dispatcher just started. purge is the terminal part of a watchdog: If a process doesn't callback for a too long time with a prolongate or the socket connection has been lost, the dispatcher assumes it has been aborted or got runaway and releases the remaining read locks for this process.
The processes themselves log for each message their attempt to emit the message, when they have been satisfied with the outcome of the message (usually obtained their lock) or when they have timed out while waiting for a response.
The frontend usually runs in parallel to the backend and the dispatcher. However, if no updates take place, the frontend could be used on a filled database also without the backend and the dispatcher. The frontend uses one process interpreter or osm-3s_query for each query; the process terminates when the query is complete or has been aborted. It accesses in the database directory all files but the logfile transactions.log read-only.
The difference between interpreter and osm-3s_query is that interpreter is designed for CGI access. Thus it needs and accepts no parameters on the command line, it starts its output always with a MIME type or HTTP headers, and all error messages are implemented as HTML output. On the other hand, osm-3s_query can be adapted with various command line parameters and produces simple text error messages, no MIME or HTTP headers.
Beyond the communication with the dispatcher, the frontend logs the requested query, just as the user has entered it, again with date and the PID of the request, into the file transactions.log.
The meta data component is included at both source code level and source data level. Thus, it changes the behaviour of almost all core components.
In particular, update_database, apply_osc_to_db.sh, dispatcher, interpreter, and osm3s_query now care for the following additional files in the same way they do for the other files in the database directory:
For each of these programs, this behaviour is triggered by adding a --meta parameter to the startup call.
The only functional difference is that the features depeding on meta data like out meta; are only operational with this extension. The essential non-functional difference is that everything now runs two to three times slower.
The XAPI compatibility layer is completely separated from the core. It consists of two components: for each query, a process xapi is started; it then calls the program translate_xapi to transform the query into an Overpass XML query. The transformed query is saved in a temporary file into the directory /tmp/translate_xapi. Then xapi invokes osm3s_query with the translated query.
The second component is the permanetly running helper process cleanup_xapi.sh. This process creates the directory /tmp/translate_xapi and cares for the orphaned temporaries that result from aborted queries.
The XAPI compatibility layer doesn't read or write directly any files beside the temporary files. It also doesn't directly log something. An XAPI query can only be reconstructed indirectly from the translated query in transactions.log.
The area generator consists of an additional permanent process rules_loop.sh, an additional log file rules_loop.log and additional files in the database directory.
The additional permanent process rules_loop.sh runs in an infinite loop a batch processing of all potential areas by the process osm3s_query. According to the concept of transactionality, the newly created areas become visible all at once at the end of each batch run of osm3s_query. The areas are contained in the following files in the database directory:
Additionally, the file area_version alsways contains the timestamp of the data as a reading process would currently get it. All the data is visible to the frontend and is used when a user requests data derived from area data.
The log file rules_loop.log contains information about the beginning and end of each batch run.