Data rich, analysis poor

Oakland has a new School Superintendent, I like him, partly because of the following statement he dropped at a meeting of the Youth Ventures Joint Powers Authority recently- all the city and county heavy-hitters were there, discussing the possibility of hiring an out-of-state firm to do a data report on Oakland. There was much debate about the need to do this, the need for non-local data folks, the quality of local data, but Anton Wilson wonderfully cut through that: “Since I’ve come to Oakland I’ve seen a huge stream of data come across my desk. We don’t need more data, we’re not data poor- we’re analysis poor.”

I could have high five him for that statement. But that would have been awkward from across the room.

His point is one that I’ve harped on about for some time in Oaktown. We have troves of data, but barely a person doing thoughtful analysis of it to inform decision making, policy, evaluation (with the exception of some bigger programs that do get evaluated heavily).  A similar incident highlights this even more starkly. In front of the County board of supervisors, a department chief was utterly stumped when asked a seemingly simple, core metric about their department, after an injection of $75M dollars in the past couple of years for a new program.  I know that agency has tons of data, but I’m also aware of failed efforts to replace huge parts of their technology base and a stagnating effort to build a data team, so while I share some of the pain, ultimately it’s up to all senior leaders to take seriously and invest in people and systems to help make modern government agencies data smart if not fully data driven.

Part of our current problem is that the understanding of technology and data is very poor at the executive level and this often results in unwise mashing of technology and data folks with little thought to those being the right people with the right skills to actually understand your operations. I’ve talked often of the need to integrate data analysts and researchers into regular agency strategy and planning to they can respond as needs arise, but this is also a higher level problem- started when those responsible for departments do not themselves have enough data savvy or technology awareness to make good initial decisions.

If you’re one of the data geeks or tech folks in government, a good way for you to both increase your value and to help grow your organization is to add a layer of analysis or context when asked for simple data products. Instead of just giving the numbers of what you’re asked about, give some context to how that has changed, ways that measuring that thing have changed, gaps in your data that make that data fuzzy, or even better, ask those annoying questions like “What is this being used for? What decisions are you trying to make? Can I help you when it comes to digesting this information at a planning meeting?” You’ll be stunned at the number of exec level meetings with people saying ‘I don’t really know what these data mean” or “I wish we knew some context around these data”, but never bother to pass those issues down to you. Suggest you can both produce better products and also help with analysis if you are part of the process.

For leaders, humility and awareness of how much data and tech really drives the world is a powerful starting point. Look at what other progressive agencies are doing with performance management, accountability and data driven initiatives. Copy them. And perhaps most important, find a local ally who does know data driven strategies and technology management in their sleep and have them help you make better decisions. One last clue- buying business analytics software won’t help you, training your staff properly and building your capacity by hiring data and tech savvy staff will!

Data: It’s all about people, not the data

I’m a data geek. I’ll own that. I love what data can do, what it can inform, what it can tell me.  I constantly find myself mentally connecting conversations I’m in and meetings I’m part of to the data that could best inform the discussion or the decisions. It’s a bit of a problem.

As our society and our government becomes slowly absorbed by the data deluge we’re now enabling, there is a righteous backlash from many that data isn’t what it’s all about, data are not more important than say, people. And this is a fair suggestion. Sometimes this is a valid and constructive statement – the point of analysis is not the data, the results or the visualization of those results, it’s what those data can do to inform decisions that will have a human impact that matters.  Where I get frustrated is with people loving to push back on the idea of using data pro-actively is when people argue that “this problem isn’t about data, it’s not something we need data for, we already know what’s happening”. I hate those statements.  They relay a level of arrogance that is not intentional but real.  Anytime someone already fully knows the nuance and scale of a problem, they better also have insights as to the solutions, otherwise what good has their knowledge and insight been to the people they care about helping?

This is another case of two sides acting as if only one side is important. And that is not something productive or effective for most social issues. It’s next to impossible to get executive buy in to change something with just experience and intuition, we don’t often see policy or investment decisions based on insight alone.  Likewise, we should not ever be making serious decisions or assumptions just based on data alone. That leads to decisions made lacking critical context and nuance and to simplistic technocratic solutions. Better to be pairing the data with the insights and experience of those living out those data.

Just as policies are often more successful when developed with the decision makers and implementers involved, so too should data driven decisions be constructed.  A great local example of this in action appeared in the release of our latest report focused on attendance problems in Oakland Unified Schools. Despite serious problems of chronic absenteeism across the district, Garfield Elementary is one of six schools in Oakland that have cut chronic absences by half or more. The Principal, Nima Tahai said “First, it’s data driven. You have to have the numbers in front of you, student names and down to the reasons for each absence… Then, school staff must engage in one-on-one work with families, reaching out to them to find out what is going on and talking to them about the importance of getting their kids to school. He went on to say that Garfield administrators even pick up kids to drive them to school if a family is stuck without transportation or a parent is ill.

This problem would never have been raised to the community’s attention without thoughtful analysis of very detailed data on every student in the district. Data revealed the scale of the problem, and then, in the hands of a facile administrator, were used to identify individual points of influence or action- each student in need of help.  The data alone mean just a nice report or a compliance document. When delivered in a form that can support action, these data become powerful elements of change. Data, people, action. That’s how government should be driving change, data driven, not data obsessed.

*First posted on Govloop.com

Are your data too slow?

Not everything can be Big Data. Not everything should be either. But some data do need a kick in the pants, so to speak. Are the data you produce or use real time, coming down the pipe as a feed everyday, or are you stuck with years old data for your planning and analysis purposes? If you are in the latter, don’t feel bad — you’re not alone.

For those tracking Ebola outbreaks in West Africa, the stream of data is steady but not real time, yet decisions that impact people’s lives are being made every day about resourcing and responding to this crisis.  In the USA there are similarly important data needed — many infections or diseases are notifiable — requiring direct notification of the Centers for Disease Control and Prevention. However regular hospital visits, treatments and surgeries go through a very big, very slow pipeline from local clinics and hospitals up to the state agency level and after processing, refining and some magical treatment, these data flow back to local public health and research agencies some years later. Traditionally this timeline was “all we could do” because of technology limitations and other reasons, but as we rely more and more on access to near real-time data for so many decisions, health data often stands out as a slouch in the race for data driven decisions.

In a different vein, campaign finance data for political donations is sometimes surprisingly fast. In California all donations to campaigns require the filing of a Form 460 declaring who gave the funds, their employer and zipcode. Campaigns are supposed to file these promptly, but this does not always happen until certain filing deadlines. Nevertheless, these data contain valuable insights for voters and for campaigns alike. These data get submitted as a flow, but they then end up in a complex format not accessible to average people — until someone changes that. A volunteer team at OpenOakland created a very powerful automation process that takes these data and reformats them in a way that makes them accessible and understandable to everyone at http://opendisclosure.io. Yet even this system of automated data processing and visualization suffers from a lack of perfectly updated data on a daily basis- the numbers shown each day only reflect the data filed to date, so big donations or changes in patterns do not show up until those are filed — often at a somewhat arbitrary deadline.

Unfortunately not all data are filed frequently and do not come with an easy to use API connection to allow developers and researchers to connect to them directly. Take crime data. Very important information with a high demand for all sorts of decisions at local levels. Your police force may publish good crime data each day or maybe just each month which is useful for real estate people and maybe good for analysts and crime investigations, but how do we know if our local efforts have successfully impacted crime? We go to national data. The Federal Bureau of Investigations (FBI) collects data from most law enforcement agencies in the country and publishes it at as the Uniform Crime Reports (UCR). Unfortunately, these data are published years after the fact. There is a convoluted process for local agencies to format and filter their reports, but then these data take years to get published.

We recently created a violent crime fact sheet using the latest (and recently published) available UCR data — for 2012. This lag in data means that county supervisors and other officials are trying to evaluate the impact of crime prevention efforts but can’t even compare their outcomes with other cities due to the lag in this data – we have to wait for two more years to see if these data indicators  changed in other comparable cities, or if our interventions did have a measurable impact.  This sort of time lag means that no local officials have good comparable data in a reasonable time frame- a poor system for modern policy makers to rely on. The FBI is working to slowly implement a newer system, but it is not clear that the lag will improve.

Every agency responsible for collecting data for operational purposes MUST start thinking about how it can make these data safely available to decision makers and to the public on an expedited process.  The technology is now very accessible to support this, and if necessary we should be considering bifurcated approaches — the old, slow feed to state and federal agencies and a new, agile feed for local use. Privacy standards and quality are simply things that guide how we can do this, they are not actual barriers unless we choose to let them be.

Government is a business, albeit one with a monopoly on services it provides — and it’s not cool for government to be making decisions using years old data when the private sector is increasingly data driven and real time. We can do this!

* First published over at Govloop

Does the world really need another PDF report?

If you’re in government or academia you have surely seen reports that sit on shelves and do nothing once they’ve been compiled. You may even have helped to produce them. They often cost a lot yet yield very little.  At the other end of the information delivery spectrum are powerful, dynamically adjustable web dashboards and interfaces that can often be adapted as needed, but those don’t give us recommendations nor allow us to answer deeper questions.  Often what is really needed lies somewhere in between.

Consider your normal report deliverable – a PDF.  Perhaps you get to provide input for a round of error checking and review once it’s completed, otherwise your only gain as a client is a static document.

Quite often once a consultants report is delivered we realize we should have asked different questions, required more detail in certain areas and more context behind certain explanations, and maybe some things were just not relevant in the end. By then we’re stuck with what we paid for, useful or not.  The fact that it’s 2013 and we’re still thinking in static deliverables and ‘final’ anythings should be astonishing. How can we be smarter about data?

1. Don’t ask for a report.  This assumes that you know everything you will need to know up front, which is often false.  A static report cannot adapt when you realize you asked the wrong question, when you need to dig deeper into a single issue or data set.

2.  Evolve. Consider the flow of information needed for a community planning process- a single dense report up front is simply a huge chunk of information that most people will ignore and most cannot absorb. Ask for data vignets or factsheets on certain aspects that can be delivered along the process timeline to meet needs as they evolve. As your understanding of data needs changes along a process, your data team must be there to support you at each stage.

3.  Don’t silo or isolate your data folks –read more

4.  Iterate. Instead of final delivery and review, adopt a more collaborative approach with your data team. Sit down and brainstorm the direction and details as they form. Waiting for the final version means you’re stuck with it.  You often discover that you need to dig deeper with a specific indicator, or that you need to dis-aggregate to get to the real important stuff. This can’t happen at the end of a report process. Require your staff or consultants to plan for and provide multi-stage reviews. This way the data geeks can get strong guidance from you, and you can better understand the process of getting and analyzing data.

5.  Own your data. or better yet, open your data.  When you pay for a report you get just that, pages, in a PDF.  As we encourage more government agencies to open their data for use by all, we need to do the same in our sector.  When you contract for a report or research support, require the real data to come with it. That way you’re not locked into using the data just how the consultant prepares it; you can manipulate it any way you need.  If you’re a nonprofit or a government agency, you should be considering opening the data for public use. You’ve paid for it, the hard work is done, now you can provide an amazing resource to your community and your stakeholders by publishing the data unearthed in your project.  Data is the ultimate non-consumable resource!  If you’ve gotten government data for your work, put it out there and make it available for others to benefit from also. We work in a far too siloed sector. Why should ten local organizations have to expend the same resources to find the same data? When government data is ubiquitous and easy to find, our work is better, smarter, cheaper.

We need to change how we think about information and about informed processes. We need to be able to learn constantly and to refine our knowledge over time. Static reports don’t allow us to do that.  It’s time we wise up about what to ask for and when to ask for it.  At the very least we need to be asking: “what is the actual value we get from one more PDF report?”.

Don’t Silo Your Geeks

First posted on the Harvard Data Smart Cities here

I’ve just told another partner organization “Don’t silo your geeks!”  It’s about the tenth time this year that I’ve conveyed this message.

The way most organizations utilize research, mapping and data is the same whether the analysis comes from an internal group or a contracted partner. You ask the data folks to do some analysis, with a well formed plan to give them, then get the results and go do the thinking work to implement a new plan or improve an existing effort. So what’s wrong with this model? Everything. This model not only serves to perpetuate a gross misunderstanding, it also serves to devalue your own staff and to rob your organization of valuable insights.

When you take your broken car to a mechanic for repairs you tell them the symptoms and then leave them to do their thing. All good.  Unfortunately with data analysis you’re not just following a formula model like: issue+data+geek=result.  By handing over a specification or formed plan for analysts to follow, you’re missing the fact that the analysts know an enormous amount about what is possible, best practices for indicators, methods and communication styles, and about how to frame a research project to ensure your goals are met. Data geeks actually happen to know a lot about your work, your issues, and how to effectively think through a problem.  We’ve long treated data folks as simple number crunchers who know magic tricks that we leave alone to do their thing. That’s a serious misunderstanding.

Involve your data partners in your thinking, strategy, and planning and you ensure a higher chance of success for your project.

This approach sends a message to your data team or consultant that you really only consider them useful for doing the geekery, that they cannot possibly understand your problem or the end application of the data.  When organizations maintain the stigma of data analysts being simply geeks who like tech, you ensure those very talented individuals will never truly reach their potential in your organization.  Given the average analyst possesses traits including problem solving abilities, critical thinking skills and rare creativity, do you really think we’re using them best by siloing them away and perpetuating the geek stereotype?

More importantly, you ensure that your analysis is never as good as it should be by isolating the data folks from your initial thinking process, from your planning and brainstorming phase and your research formation efforts.

Would you take your car to the mechanic with a detailed procedure to follow? Not likely, you’d consult with them and develop a plan that includes their detailed knowledge and your broader mission (namely keeping your car reliable). Then they execute, you receive the results.  By engaging with your data folks in the early phases of a project you add valuable perceptions and insights, you allow for perspectives on what can be done, what would be problematic and how best to frame the plan.  You gain from having the folks who will execute your plan helping to form it, ensuring that your ask is reasonable and that your ideas can be executed upon.  A weak plan is nearly impossible for some research group to turn into a useful end product.  Involve your data partners in your thinking, strategy, and planning and you ensure a higher chance of success for your project.

Likewise, when you get your research report, data outputs, maps or other results, don’t consider the role of your data geeks to be over.  I’ve witnessed so many planning and implementation meetings where the folks in charge butcher the data analysis or misinterpret the maps, leading the effort down a bad path with less chance of the desired impacts.  Take the data geeks out of this stage and your chances of making similar mistakes are seriously amplified.  Keep your data partner engaged in this crucial last stage. Allow them to help form the end result, expect that you will raise up further data questions that will require more work to go back and answer.

A final benefit in keeping your data team involved at all stages is that you’re building the capacity and skills of your data folks, giving them insights to better guide their phase of the work, strengthening your team, and allowing for more diverse, experienced voices in your efforts. That’s rarely a bad thing.

A data-driven organization acquires, processes, and leverages data in a timely fashion to create efficiencies, iterate on and develop new products, and navigate the competitive landscape.

What we do every day. We have a long way to go before local government and nonprofits are even close to data driven!

From DJ Patil