8th February 2017

Takeaway the Lab: Data Analysis To-Go



Ex Aula - Logo4-cropped

 

Neal Thomas Barsch, MSc in Economics for Development (2016)

The digital universe by 2013 had grown to an estimated 4.4 zetabytes of total stored data [1]. This is 4.4×1012 gigabytes, or about 660 million years’ worth of HD video. In the lab, or connected to the Internet, collected data makes predictions about my human behaviour every day. Advertisements on Facebook are related to my “likes” or Internet cookies, and search suggestions appear based on what I have been reading. The network IP address I am connected to is location-tracked, allowing companies (however you feel about it) to generally track where I am. Algorithms instantly run so that a product I shop for on Amazon appears as an advertisement on an unrelated site I subsequently visit. Yet, data collection in the developing world is an entirely different story. Paper surveys and data collection are still the norm, and most analysis to date requires collecting data, then returning to the lab for analysis. What if instead we could bring the lab along with us to the field? What if we could predict, live and offline in a rural village in the Philippines, the same things we do back in a lab?

My research centres on a cultural phenomenon in the Philippines called the “Sari-Sari” store. Literally translated “Sari-Sari” means “variety-variety,” and these stores are usually like your typical corner store, family owned, selling soft drinks, crisps, household goods, and other small food items. It’s estimated there are more than a million Sari-Sari stores in every corner of the island nation.  For comparison, the entire UK has around 50,000 pubs. Accounting for the difference in geographical land area, imagine for every pub you passed in Oxford or London there were instead sixteen pubs, and then you would be on the level of Sari-Sari stores [2]. This ubiquity is massively powerful in its possibility to solve some of the biggest problems in the Philippines. What if these stores could provide crucial financial services to unbanked populations? What if people could save for their children’s educations safely and securely, gain capital for collateral for loans, and save to start businesses?  And perhaps most importantly, how do we lower the cost of establishing and providing these services to the level where banks will actually be interested, consumers pay little or no fees, and the market is sustainable?

It is the last point I am most interested in with my research into prediction models that work live and offline in the field. Not every Sari-Sari store will be a good mobile banking branch, and it’s difficult and costly in the traditional model to find and recruit the right ones. We have to find the trusted stores in the community, and furthermore stores that are used to dealing with cash. Culturally, it is extremely tough to go to a random Sari-Sari store, especially in a tight-knit rural community, and ask ‘how much cash do you have?’ or ‘how much does your community trust you?’ What we can do is take account of the stocks and variety of the store, ask how often the store restocks, what people buy most often, the business hours of the store, and even take into account the materials from which the store is built. With this information (what econometricians would call proxy variables), we can quickly build a picture of how the store fits into the community and functions on a day-to-day basis without asking sensitive financial or personal questions.

The algorithmic models we use are built, offline, into the tablet surveys themselves. The algorithmic models are constantly predicting and assessing the survey in the background (the math of the regressions is live and offline on the tablet). The survey then automatically determines the probability certain questions will be relevant, skips irrelevant questions, emphasises relevant ones, and stops when a store is either clearly a fit (or clearly not) for recruitment into the mobile financial services programme. This not only allows easier assessment by field workers as the survey recruits stores automatically based on responses, it saves field workers time by skipping irrelevant questions and allows for more efficient and sustainable recruitment.

The progress made with the tablet models doesn’t stop with the recruitment analysis. If the store fits into the model, then the tablet can be programmed to display a recruitment video, answer frequently asked questions, and take application forms for any programme, including for the mobile banking pilot. These processes are automatic and require no return-to-office assessment, and the automation makes it extremely easy for field workers to use the surveys with minimal cost and training. Going even further with the “field-lab” capability, we assess and build cell signal maps using the same tablets the field workers carry to take surveys. The tablets are programmed to record the GPS coordinates, cell signal strength of each SIM card (we use dual-SIM devices to build two network maps at once), tower ID the tablet is connected to, and other network information every 20 metres the tablets move (all automatic and in the background). When we go into the field, the lab now truly comes with us.

So far, the prediction models I have built have been used to survey nearly five thousand Sari-Sari stores in the Philippines, collect over a million cell signal data points, and recruit hundreds of stores into pilot programmes for business training and mobile money projects.

These applications of the “field-lab” technology are only scratching the surface of what is possible. Disaster prediction models can be built into tablets (and already are to some extent) to direct relief workers to the most affected areas, on a house-by-house basis, using built in modelling and satellite imagery assessments. On the microenterprise side, models will be able to direct relief and predict where stores could be used as immediate aid for affected areas providing immediate food and water needs.  International organisations would then be able to contribute funds directly to stores so they can immediately distribute these products to the community. In education, homework built with the technology will be able to predict exactly where individual students’ weaknesses are and focus curriculum and tailor lessons to individual students. In business, prediction algorithms could be used to assess business opportunities in impoverished areas, tailored to reflect the actual situation of each area, which would help families pull themselves out of poverty.  The possibilities are truly endless. The field, rather than the lab, is the new frontier of data technology.

 

[1] International Data Corp. as cited by Science and Technology Research News (https://goo.gl/lFcLgK)

[2] UK Campaign for Real Ale Pub Tracker Number (https://goo.gl/eJrBho)

 


Recent Research Highlights

10th March 2017

Not All Engineers Build Buildings: Working with Proteins on a Nanoscale

  Theodora Bruun Doing research in a protein lab, the most common question I get asked is ‘Are you doing it for the gains?’ (Gains is a colloquial term for building muscle through going to the gym and often by consuming large amounts of protein). If you’re like most people, on a day-to-day basis you […]

Read More…

2nd March 2017

Megafloods on Mars: New Perspectives on an Old Mystery

  Lucy Kissick, a first year DPhil in Earth Sciences When the team behind NASA’s Mariner 9 mission first glimpsed the surface of Mars forty-five years ago, they were shocked to discover an entirely different planet to their predecessors’ observations. Mariners 4, 6, and 7 all by chance observed the same crater-scarred, moonlike highlands during […]

Read More…

24th February 2017

Inflamed Hearts and Clogged Brains

Modh Karim, a first year DPhil in Population Health Heart disease, stroke, cancer and diabetes – it is difficult to find someone who has not had a friend, relative, or family member afflicted by one of these scourges. With the recent advent of an array of diagnostic tests and novel drugs, we have made remarkable […]

Read More…

14th February 2017

Dung Beetles: We Should All Talk More About Poo

Elizabeth Raine, DPhil in Zoology (2014) When meeting new people and asked to explain what I study for my DPhil I am ashamed to say I often try to steer clear of mentioning dung beetles. It’s not generally seen as socially acceptable to immediately start talking to a complete stranger about poo – especially over […]

Read More…

8th February 2017

Takeaway the Lab: Data Analysis To-Go

  Neal Thomas Barsch, MSc in Economics for Development (2016) The digital universe by 2013 had grown to an estimated 4.4 zetabytes of total stored data [1]. This is 4.4×1012 gigabytes, or about 660 million years’ worth of HD video. In the lab, or connected to the Internet, collected data makes predictions about my human behaviour […]

Read More…