When working on a mapping project, I wanted to use some data that I could only find in a .vtpk format. I was using Python for my project, and I didn’t see a Python library that would allow me to access the data in this file, so I ended up writing a library myself. This post will be about this file format and a link to the library on github.
Mapping Tiles
The name of the file format (vtpk) stands for “Vector Tile Package”. The vtpk files are most often (only?) producted by arcgis, the GIS software from Esri, the “big GIS”. Their software is used by many companies and government agencies in the USA. You can find files in this format by searching for "vtpk" on arcgis.com
The word “tile” has a specific meaning when it comes to GIS: it usually refers to a pre-rendered map square, in a context of a system where these squares are available at different levels of detail (LODs). For example, if you are looking at an interactive map, the client (browser) will request map squares (tiles) for the zoom level you are looking at. As you pan around adjacent tiles will be requested; if you zoom in or out the tiles for a different level of detail will be needed. A Google Maps URL (such as https://www.google.com/maps/@42.3124065,-71.0468355,10z) has the LOD/zoom parameter as the third parameter indicated with z: in this example, LOD/zoom is set to 10. Generally speaking, zooms/LODs start at 0 for the most zoomed out (least amount of detail) level and go up from there for more detailed tiles. As of this writing, the maximum z parameter for google maps is 21, and that is about the right order of magnitude for anything but a map that has level of detail of at a millimeter level: the Web Mercator projection (the projection most commonly used for interactive computer maps) has extents of about 20M meters in both width and height, at zoom level 21, a single tile would be about 9x9 meters, or about 30’x30'.
The way these tiles are usually logically stored on and retrieved from a server is via a quad tree. At the top level (LOD = 0), you have a single square for the entire world. Each successive level of detail, subdivides each tile in the previous level into four tiles, so while at LOD=0 there is only one tile for the whole world, LOD=1 has four tiles, LOD=2 has sixteen and so on. The animation below shows this subdivision for levels of detail 0 through 5.
The next section describes how tiles are stored in a VTPK file.
VTPK format
Like many other non-trivial file formats (.whl, .jar, etc), the VTPK file format is a .zip file under the covers. There are really two parts to the data inside the .vtpk file:
- The metadata. This is mostly stored in two
jsonfiles inside the zipfile:root.jsoninp12directory androot.jsoninp12/tilemapdirectory. - The feature data. When it is all said and done, this data is stored in a mapbox vector file format, and fortunately, there is already a python library for reading this format: mapbox-vector-file.
For the metadata, the p12/root.json contains some of the expected stuff: extents of the data, which levels of detail are available, and so on. The p12/tilemap/root.json file specifies which tiles are present in the package; unlike google maps or OSM servers, which (probably?) have all the rendered tiles available at every level of detail, the data in VPTK files is sparse: not all tiles are present. For example, if in your VTPK file you want to save features relevant to the public transportation system in Boston, USA, you don’t need to have any tiles for the southern or the eastern hemispheres.
The JSON layout inside the index key in p12/tilemap/root.json mirrors how tiles are logically organized by levels of detail. The JSON entry for corresponding to each tile can have one of the following three types of content: 0 meaning that tile data is not present, 1 meaning data for the tile is present, but there are no “children” subtiles at the next LOD, or a (four-membered) list where each entry represents the data of each of the children tiles.
The feature data itself is stored in .bundle files (See the technical specification here). The tile directory in the VTPK has a directory for each level of detail (L00, L01, etc.) and that directory will contain one or more .bundle files, each one of which will contain data from one or more actual tiles. Reading the features for a particular tile requires:
- Identifying which bundle file the tiles features are in: this is done via naming of the
.bundlefiles. - Reading the
.bundlefile and the tile index inside it. This allows you to find the data for the tile you are interested in.
It took a little while to understand how it all fits together, and the results are in the vtpk-reader github repo. Hopefully it can be useful to others, perhaps even you, dear reader?
Links:
- vtpk-reader library https://github.com/kshklovsky/vtpk-reader
- Esri website https://www.esri.com/en-us/home
- ArcGIS website https://www.arcgis.com/index.html
- Vtpk search on arcgis.com https://www.arcgis.com/home/search.html?searchTerm=vtpk
- Web mercator projection properties https://epsg.io/3857
- Mapbox vector tile specification https://github.com/mapbox/vector-tile-spec/blob/master/2.1/README.md
- mapbox-vector-tile Python library https://github.com/tilezen/mapbox-vector-tile
- Esri compactcache file specification https://github.com/Esri/raster-tiles-compactcache/blob/master/CompactCacheV2.md