The road to SDMX - Statistical Data and Metadata eXchange - what is it, why use it, and which organisations support it?

SDXM

The concept of SDMX is intrinsically linked to open data and APIs for accessing this data.

Open data is free, publicly available data. This data is increasingly made accessible through APIs or Application Programming Interfaces. An API can return data in response to a request for data, or ‘query’. An application can then use an API by sending it requests for data, for example, the population trend for the USA from 2000-2015. This is useful because it is no longer necessary to manually maintain and update a database used by the application. Instead, the application can send a request to the API of an organization that manages the data for you and ensures it is always up to date.

One such organization is the World Bank, which was one of the pioneers in making all their data (7000+ indicators covering virtually all countries) available for anyone to use through their public API. Their data is updated several times a year, and is obtained from all the key custodians of international data in many domains, including health, economy, education, population, and so on. The Apps for Development competition held in 2010 was used to promote their new API, and the winning entry was StatPlanet - for automatically visualizing all country-level World Bank data through their API. Since then, many other key custodians of data have followed suite.

SDMX is sponsored by various institutions, including the World Bank, the International Monetary Fund (IMF), OECD, the United Nations Statistics Division (UNSD), the European Central Bank (ECB) and Eurostat (the statistical office of the European Union).

What is SDMX?

When data is provided by an API, it needs to be structured in a certain way so the receiving application can interpret it correctly. Initially, the most common API response formats were XML and JSON, with JSON being more practical for large amounts of data, because it is more compact. More recently, SDMX has become a common standard for international organisations. SDMX stands for Statistical Data and Metadata eXchange, and comes in several flavors - including SDMX-ML (XML-based) and SDMX-JSON (JSON-based). What SDMX provides is a common structure and nomenclature for data and meta data. Why this is useful is examined below.

SDMX is more than just a way to structure and format data and meta data. SDMX also defines standards for the mechanisms and processes through which data is exchanged.

Why SDMX?

SDMX has become an international standard exactly because it is a standard. It provides a common way of structuring both data and meta data. This means that any applications using SDMX data will know in advance what format of data to expect. Once an application works with the SDMX data from one source, it can be made to work with SDMX data from any other source. It is therefore much more efficient, and less likely to be error-prone, since you avoid having to transform and reformat data in order to get it to work in a particular application. With XML and JSON, there was no fixed structure and different organizations used different structures. This meant that getting an application to work with various XML and JSON outputs in different structures was complex, slow and tedious work.

SDMX can itself seem complex and intimidating, especially if you look at the documentation - which is huge and full of complex terminology. However, for users of SDMX data (including app developers), the most important thing is obtaining the data in a structure that is easy to interpret. In that respect, the SDMX-ML format is just like any other XML format, but with the advantage of having a recognizable, standardized structure. (Generating queries for obtaining data from an API can be tricky, but they are the same whether the response format is SDMX, XML or JSON - for some APIs a 'query builder' is provided which can make this process easier).

SDMX is especially useful if you require country-level data. This is because the key custodians of international, country-level data already make their data available in SDMX format. An application such as StatPlanet which supports SDMX can therefore access tens of thousands of indicators, from Arable land to Urban population growth.

Which organizations and key custodians of data support SDMX?

SDMX has been around since 2004, but it has only started to gain traction in recent years. The key custodians of data currently supporting SDMX include the IMF, Eurostat (Statistical Office of the European Union), the European Central Bank, OECD, the World Bank, and a number of UN agencies and initiatives, including the United Nations Statistical Division (UNSD), UNData, UNESCO Institute for Statistics (for data on education and culture), FAO (for data on food and agriculture) and UNICEF-DevInfo (for data relating to the SDGs or Sustainable Development Goals).

Amongst government agencies, the Australian Bureau of Statistics and Samoa Bureau of Statistics are amongst the first to add SDMX support.

See also:

StatSilk's blog
Log in or register to post comments