The opportunities and challenges of open data


When people think about ‘data’, especially in its modern context, agriculture may not be the first thing that crosses their minds. Yet agricultural data are some of the oldest: megalithic stone circles mark the yearly shift of the sun – and thus, the seasons; the earliest cuneiform tablets from Mesopotamia record grain yields and livestock sales; the Domesday book lists farmsteads and agricultural workers; while farmer’s almanacs, going back to the middle ages, record meteorological conditions and their effect on crops. Modern statistical science, which developed from a need to understand data, also owes a debt to agriculture – from Mendel’s rules of heredity being essentially statistical, to the great Ronald Fisher, who revolutionised statistics in the 20th century while working for Rothamsted Research, trying to help them understand the wealth of agricultural data they were generating.

Just as the knowledge-sharing in agricultural almanacs was beneficial to farmers in the middle ages, open data has much to offer farmers and communities around the world in the 21st century. I was recently one of 700 delegates from around the world at the Global Open Data in Agriculture and Nutrition (GODAN) meeting, sponsored by the UN in New York. There was a lot of optimism about the potential of open data to be an important part of the solution to global problems of food poverty and poor nutrition, especially as the population increases and food production becomes more costly, due to climate change and uncertainties about energy and fertiliser production. Open data has a capacity to make a real difference in developing economies, which often leap-frog more developed economies, with some of the first telecommunication infrastructure to appear in developing countries being the mobile smartphone. Codifying data in the right formats can turn people who might not have access to some other basic infrastructures in to genuine participants in economic activity.

In developed countries, the methods of engagement may differ but the effects should be the same. This is fundamentally about allowing information to flow easily between those who have knowledge to those who need that knowledge with minimum friction, freeing up capacity to innovate. For example, the activities of CABI, an organisation which hold statistics about plant pests and diseases, can assist farmers to understand what plant diseases they are coping with and how to adapt their farming methods to reduce the impact of disease. Other forms of communication can help farmers to find markets for their produce, get paid and also find sources of the best seeds suited to their soil conditions. All of this is powered by open data.

Data is agnostic to whether the applications are in developing or developed economies. In developed economies with a technological advantage, data can support precision agriculture to get the best quality produce for market, while in developing economies it can help those who farm at small scales to understand how their yields can be improved. In the UK, for instance, 3D landscape models, originally generated by the Environment Agency to plan defences for and mitigate flood, has been made open and is being used by English sparkling wine producers to identify slopes with the best aspect and elevation for planting new vines. In emerging economies, simply sharing data on soil type, crop variety and yield can make a huge difference. Better access to data is a great leveller, a tool by which inequalities can be addressed.

The Open Data Institute suggests that open data is infrastructure for the digital economy. In an environmental context, ensuring that individuals have access to information empowers them to make decisions informed by evidence. In an agricultural context, open data has the potential to share information to allow the development of a culture of continuous improvement.

The commitment to open data at GODAN is impressive, representing a global effort. The enthusiasm there was infectious, the examples of successes were compelling, and the genuine commitment of the delegates to make a difference in the challenges facing the planet was inspirational.

The UK has been a leading light in establishing GODAN, together with the USA and Kenya. Other countries, such as Germany, are now coming on board. In the UK we are trying to lead by example by making our own government data open by default, unless there is a compelling reason not to, for example if it contains personal information. After a sustained effort, over 40% of all UK government data now comes from Defra – over 12,000 datasets, a figure still rising – and much of this is about food and farming, or is at least relevant to the environmental outcomes affected by our agriculture. Anybody can access, use and share these data, which has seen the data being used in new and unanticipated ways, as innovators use data intended for one purpose in ways that solve problems for other areas – including the LiDAR data being used by wine producers.

As I emphasised in the two presentations I gave at the GODAN conference, open data is not enough on its own. There are two additional and essential steps which have to happen. The first is making sure there are tools to allow people to ‘see’ data much more easily. People need to be led through the crowded landscape of the inner workings of websites in ways that allow them to interrogate the data to answer questions relevant to them. This needs some smart thinking, including employing machine learning, where computers themselves learn from the questions people are asking and construct the algorithms needed to access the right data.

The second step is to develop use-case studies. These are illustrations of the ways in which using data has helped farmers and those working in agriculture. These are necessary because often those people involved don’t know what questions to ask of the data. If they never ask the questions then the knowledge residing in the data will never be mined and put to use. This is a much bigger problem than many people realise. Unless practitioners are primed to ask questions of data they will never know what they are missing.

Despite all the euphoria around the power of data, I was surprised by how little caution there was in the rhetoric emerging from the conference. For some, it was a case of open data at any cost. Implicit in their argument is that any restrictions or limits on use – whatever they might be – will greatly reduce the benefits of the data. In my talks I challenged this. Too many great technologies, especially many of those associated with synthetic biology, which should be revolutionising farming and food production across the globe, are sitting on the shelf unused. This is largely because the early gung-ho messaging when these technologies appeared sensitised people to potential (but largely spurious) disadvantages. Although we are more used to knowledge and information flowing to and from us that we perhaps are to having synthetic biology embedded in our lives, there are dangers from giving out the wrong messages.

Everybody leaves behind their digital smoke and our own signatures sit within the clouds of data, which power digital economies. For those who wish to, there are ways of using this to find out more about us than perhaps some people might wish. I am a great supporter of open data, but we need to make sure that people know where data about them are, how it is being used and by whom. At Defra, data practitioners go to great lengths to remove any personal data from agricultural datasets, but this remains a challenge for some datasets, such as data on movement of animals. There is risk involved just as in any activity, and it is important that those risks be acknowledged and managed. It is important to be open about open data.