May 29, 2020

Of course there is a infestation of sorts of well-intended COVID-19 dashboards, and data sources across the world. I waited to look at the right dataset to enhance without relying too heavily on data architecture and logistics.

Rest of the Story:

That would mean relying on an omnipresent source that uses real data and is grounded in faith of the global internet for decades. The challenge in picking Wikipedia as a data source, however is its unstructured nature. Even the capabilities offered APIs is pretty sparse as far as handling individual pages is concerned. This great for community built content that the entire global population consumes but at the same time complex to traverse different permutations of the structure. However there is a pattern to chaos and I was able to tap a micro-fraction in to it to formulate COVID-19 information.

Quick usage of the dashboard

I tried to overcome some of these gaps I had seen in the available dashboards:

a) Navigation and traversal of geographies - text based data-tables are great for the 1st time viewing but hinders repeated visits because of a subconscious mental load that there will be some typing and wait time to get the relevant information. This is highly dependent on the geography and the context, but for me I could not see going back to text heavy dashboards.

b) Data refresh rate - Time is a significant variable in a fluid situation like a pandemic and a daily refresh fall a little short than instant refresh of a web-page to reflect the latest, again thanks to everyone tirelessly keeping the information current.

Wiki parsing is through their API with parsed JSON:

private static String connectWiki(String page) {
        // Build URL
	StringBuilder wikiApiUrl = new StringBuilder();
	wikiApiUrl.append(WIKI_URL_PROTOCOL).append(WIKI_API_PREFIX).append(WIKI_API_ACTION).append("&page=")
				.append(URLEncoder.encode(page, "UTF-8")).append("&format=").append(WIKI_API_FORMAT);
        // Connect
	HttpsURLConnection request = (HttpsURLConnection) new URL(wikiApiUrl.toString()).openConnection();
	request.setConnectTimeout(5000);
	request.connect();
        // Parse
	JsonParser jp = new JsonParser(); // from gson
	JsonElement root = jp.parse(new InputStreamReader((InputStream) request.getContent()));
	JsonObject rootobj = root.getAsJsonObject();
	return rootobj.get("parse").getAsJsonObject().get("text").getAsJsonObject().get("*").getAsString();
}

Considering the volatile nature of Wikipedia edits, it requires a regular tweaks to adjust 'Wikitable' structures for a few countries, and keep up with user content edits.

API

I then exposed all of this processed data through a simplistic API probably spent an hour building up the back-end for. It does however sit behind the need to send over a free API key, just to have a little bit of control considering I am running this (my personal) website with limited resources. And the reason why this isn't open-source right off the bat:

1. I do not consider myself a UI / UX expert and hence reliant on modern web responsive themes, and for this I use a combination of 2 paid themes. It would not be non-compliant with their licences

2. The code is not too great and littered with hackish patches that I plan to clean-up before I embarrass myself more all over the internet

However if this dashboard and / or API helps even 1 person out anywhere in the work then it would have done its job.

Next

This gave me some idea on the amount of effort it takes for a bit of structuring, collating and processing of Wikipedia's wealth of information takes. I do have some ideas to generalize and have some abstraction around the processing logic that can be further put to use to categorically structure information on other subject areas as well.

Please leave a comment here or hit me up on twitter if there is anything that we can talk about.

featured coronavirus covid covid-19 covid data

Nitin Pathak

I write about life's intersections with techology and life as it happens.

Similar Stories


scheduler

Excel Based Events Scheduler

I have been facing a challenge to align the schedules of all my teams consistently in most of the fast paced projects I have managed. ... Read More

tech

Tyk API Gateway with Windows and Docker

Tyk Open Source API Gateway is one of the better alternatives if you are looking to layer your APIs with a gateway for metering, security, channel and everything else you want to gain control of. ... Read More