Are your data too slow?

Not everything can be Big Data. Not everything should be either. But some data do need a kick in the pants, so to speak. Are the data you produce or use real time, coming down the pipe as a feed everyday, or are you stuck with years old data for your planning and analysis purposes? If you are in the latter, don’t feel bad — you’re not alone.

For those tracking Ebola outbreaks in West Africa, the stream of data is steady but not real time, yet decisions that impact people’s lives are being made every day about resourcing and responding to this crisis.  In the USA there are similarly important data needed — many infections or diseases are notifiable — requiring direct notification of the Centers for Disease Control and Prevention. However regular hospital visits, treatments and surgeries go through a very big, very slow pipeline from local clinics and hospitals up to the state agency level and after processing, refining and some magical treatment, these data flow back to local public health and research agencies some years later. Traditionally this timeline was “all we could do” because of technology limitations and other reasons, but as we rely more and more on access to near real-time data for so many decisions, health data often stands out as a slouch in the race for data driven decisions.

In a different vein, campaign finance data for political donations is sometimes surprisingly fast. In California all donations to campaigns require the filing of a Form 460 declaring who gave the funds, their employer and zipcode. Campaigns are supposed to file these promptly, but this does not always happen until certain filing deadlines. Nevertheless, these data contain valuable insights for voters and for campaigns alike. These data get submitted as a flow, but they then end up in a complex format not accessible to average people — until someone changes that. A volunteer team at OpenOakland created a very powerful automation process that takes these data and reformats them in a way that makes them accessible and understandable to everyone at Yet even this system of automated data processing and visualization suffers from a lack of perfectly updated data on a daily basis- the numbers shown each day only reflect the data filed to date, so big donations or changes in patterns do not show up until those are filed — often at a somewhat arbitrary deadline.

Unfortunately not all data are filed frequently and do not come with an easy to use API connection to allow developers and researchers to connect to them directly. Take crime data. Very important information with a high demand for all sorts of decisions at local levels. Your police force may publish good crime data each day or maybe just each month which is useful for real estate people and maybe good for analysts and crime investigations, but how do we know if our local efforts have successfully impacted crime? We go to national data. The Federal Bureau of Investigations (FBI) collects data from most law enforcement agencies in the country and publishes it at as the Uniform Crime Reports (UCR). Unfortunately, these data are published years after the fact. There is a convoluted process for local agencies to format and filter their reports, but then these data take years to get published.

We recently created a violent crime fact sheet using the latest (and recently published) available UCR data — for 2012. This lag in data means that county supervisors and other officials are trying to evaluate the impact of crime prevention efforts but can’t even compare their outcomes with other cities due to the lag in this data – we have to wait for two more years to see if these data indicators  changed in other comparable cities, or if our interventions did have a measurable impact.  This sort of time lag means that no local officials have good comparable data in a reasonable time frame- a poor system for modern policy makers to rely on. The FBI is working to slowly implement a newer system, but it is not clear that the lag will improve.

Every agency responsible for collecting data for operational purposes MUST start thinking about how it can make these data safely available to decision makers and to the public on an expedited process.  The technology is now very accessible to support this, and if necessary we should be considering bifurcated approaches — the old, slow feed to state and federal agencies and a new, agile feed for local use. Privacy standards and quality are simply things that guide how we can do this, they are not actual barriers unless we choose to let them be.

Government is a business, albeit one with a monopoly on services it provides — and it’s not cool for government to be making decisions using years old data when the private sector is increasingly data driven and real time. We can do this!

* First published over at Govloop

Beyond compliance, beyond reports: Data for Action

First posted here.

A week ago the famous Napa region was shaken by a 6.0 scale earthquake resulting in serious damage to buildings, injuries and disruptions in services to a large area. This is something residents in the Bay Area have come to expect and we are all waiting for the next “big one”, overdue in most experts opinion.

The same week, our team launched a new app in response to the disaster.

Oakland is a city with a severe housing shortage, building anger towards gentrification and the unmeasured but very real displacement of low income residents who have called this city home for decades.  It is also home to 1,378 large apartment buildings that are at varying risks of collapse in a quake centered closer to Oakland. The City of Oakland and the Association of Bay Area Governments (ABAG) have studied this issue and over half these buildings have been screened – but over 550 remain to be screened for risk.  Many homes have been found to be safe, while 609 buildings (home to thousands of residents in apartments) have been found to be as serious risk – called potential Soft Story buildings – they have a large open ground level such as storage or garages that will potentially collapse in a quake- rendering those homes uninhabitable – an instant loss of thousands of affordable housing units protected under rent control – any housing units built to replace them will surely not be affordable, resulting in very rapid push out of poorer residents.

So why do we civic hackers care about this? It’s a matter of equity and a matter of many residents without good access to information relevant to their living situation- without information, no-one can act. Unfortunately, the common practice in government is to collect information and store it neatly in a report that floats around city hall as a PDF. The data live on a server somewhere, called on only when needed. We greatly respect the proactive work the City and ABAG have done in the screening efforts, however there remains a large number of homes unscreened and there are still thousands of renters with no idea of their risk- either through damage and injury or through displacement after the quake- as a result of rent increases applied by landlords passing on retrofitting costs – Oakland’s rent control policy sadly does not clarify whether seismic retrofitting costs are borne by the landlord or tenant or both.

Some months ago we convinced ABAG and the City of Oakland to publish the data from these surveys – a complicated inventory because of the changing status of buildings as they are screened and retrofitted.  We had been planning to build a new app that would raise awareness of this issue to spur action – both for tenant rights groups and for the city to determine a policy for handling these costs and for ensuring homes in Oakland are safe for residents. After the quake we realised it was an important moment to raise this issue – so we sprinted to release a new app that helps renters and homeowners see the status of their building:  

Our approach is to build tools that puts information in the hands of the public in a way they can act on it. In this case, the formal report is a good document, but it serves policy makers only, it does not inform nor empower those living in these homes.  This simple app lets any resident see how their building is rated – as exempt and not a soft story building, as retrofitted and safe or as potentially soft-story and at risk in a big quake.  

We’ve advocated for open data with local governments for this very reason (and others) – data can be used to fill up reports with snippets and summaries that help decision makers, but there should be a default to open with all data that has no legal reason to be protected – this information, in the hands of those actually affected by it can do radically more than if it were still sitting on a government hard drive somewhere in city hall!