Open Data in the Capitol- aka the anti-PDF and show us your data day.

Today I spent my day sitting in senate and assembly hearing rooms in our (rather beautiful) state capitol to testify in favor of two new bills that would make open data more impacting in California.

It’s a bit of a departure from my normal tech scene, but for the first time there are a slew of bills hitting the state and these are really important to the future of our state.

The first up was SB 272 from Senator Hertzberg, a piece of legislation that requires local governments to conduct and publish an inventory of their data systems and contents.  This one is big. I’m calling it the “Show us your data bill”.

Open data has been used by civic hackers to build countless new apps and to explain data in new ways, from apps that help inform people if they live in earthquake safe homes, to making city budgets understandable for the first time through to tools that help families find early childhood education and services easily, from the convenience of their phone.

SB272 is an important piece of our state infrastructure- while may cities do have open data policies, these take some time to implement and residents are left in the dark about what data their local government actually has, and how it’s collected.  This is important from a perspective of trust building for sure- knowing what data our police department collects is vitally important to understanding how that department works.  We’ve had impassioned fights in Oakland over data privacy yet most of this debate is happening without really know just how much and what kind of data our city really has. 

Data inventories will empower residents and will lead to better quality public records requests- right now, if you don’t know what is collected, you are forced to make vague, uncertain requests, never sure if the data exist.  With public data inventories, our communities will be able to make informed requests. This builds trust and improves efficiency.

SB 272 is also important as there is too much opacity in local government contracting; making visible the exact systems and software used in managing these data will provide valuable intelligence to the business community.

Lastly, it is also critical that we develop these inventories as technologists build ever more powerful and useful apps, we run into the issues of these apps stopping at your city border because the data don’t exist or aren’t obviously available in the next city over.  Knowing which data exist, and where, is a huge step forward in encouraging future innovation and making modern tools work for all of our residents.

Next up was AB169 from Assemblymember Mainschein, a small piece of legislation that helps define what “open” means and helps to firm up the standards in which data are published, when published.  Let’s call this one the “Kill All PDFs” Bill. It requires data or records to be published in machine readable, digital formats in ways that can be searched and indexed- this is good, but unfortunately a PDF can be searched by Google, so maybe this bill isn’t perfect from a geek’s perspective, but it does make data publishing standards much clear and more helpful for those of us using public data.  Perhaps as important, this bill requires data to be published in the original structure where possible- putting pressure on government staff to not refactor or redact data files unless legally necessary.

Both bills went through with unanimous support, off to the next stage of political machination.  I’m hopeful they will see sunlight in the end, they are important pieces of our future.  As this day wraps up, I’m left dwelling on this one thing- every issue heard today had a slew of hired lobbyists to represent the various interests- electric cars, solar systems, government contractors and more; I’ve been frustrated by the lack of movement and leadership at a state level in California regarding open data and now I realize something-

Open Data has no lobbyist. 

Open Data and Policing Reform

It’s not often you get to testify to one of your most admired leaders, and to do so side by side with another amazing community leader you’ve respected and appreciated from afar, but today I get to do both of these things and I’m pretty much in awe of how much of a privilege this is. It’s harder and harder to leave my girls at home and travel but for someone deep in the work of making cities better and more equitable at a local level, there are few chances to possibly influence change at a national level. And so I’m in Cincinnati where it’s below freezing, away from sunny California where yesterday I was drinking a good beer on the balcony with my wife complaining it was too warm. In January.

Today I’m testifying to the President’s Task Force on 21st Century Policing, which includes Bryan Stevenson of the Equal Justice Initiative, a man who bought me to tears last time I heard him speak. I don’t go in for hero worship, but Bryan would be on the short list.

What does a data geek and open government advocate have to say about the future of policing reforms in our country? My testimony is below…

Dear members of the task force and other community leaders,

I’m speaking today on behalf of Urban Strategies Council, a 28 year old social justice organization in Oakland, California where I have had the privilege of being the director of research and technology for almost eight and a half years. I am also here on behalf of OpenOakland, a civic innovation organization that I co-founded with Eddie Tejeda.

My work with the council has provided me with an opportunity to see how a lack of transparency in local government affects data-driven decision making. government technology, and community engagement. I’ve had the chance to work with many local agencies and community based organizations to help them unearth their valuable data, to analyse it and put it in context and then to help communicate the story and results of those data.

Traditionally the role of government has been perceived as a collector of data for compliance and reporting purposes, yet this is no longer sufficient in our view of 21st Century government. Government now needs to be pro-active across all agencies, especially those traditionally very closed and inaccessible. For many years we have been unearthing public data for research purposes and publishing these data openly for all to access- from data on local probationer populations, to crime reports and foreclosure filings. When we obtained both open and private data and published a report on the investor acquisitions of foreclosed homes, our work led to the creation of new laws to protect tenants and monitor housing purchases. When public data is put in the hands of communities, powerful things can happen.

We led an effort to crowd source the legislation to make open data the law in Oakland and now we have local agencies actively making data available to the public free of charge or restriction. This has led to breakthrough innovations such as OpenBudgetOakland.org which when shown to our city council led to disbelief- never before had decision makers seen their own budget in such clear context and the impact was powerful. Residents of our city were able to understand a complicated 16,000 line budget for the very first time- something made possible by opening data and by engaging the community in a respectful collaboration. Hackers, city staff and advocates working together.

Another local example of what happens when government opens up valuable data and collaborates includes our earthquake safety app (http:/softstory.openoakland.org) that helps inform low income renters if they are living in a building susceptible to collapse in the next big quake. This app was built by the community as open source and is now being deployed in a nearby city.

You’ll notice I’ve not talked about great policing collaboration examples. For good reason. Despite generating a near real time flow of crime reports, our local police departments and sheriffs have not been eager to jump into the world of open data, yet. Given the lack of trust in the Oakland Police Department, the need for real community policing and a dearth of accessible information about policing practices and incidents, Oakland is like most other US law enforcement agencies in its need to embrace open data, to develop respectful collaborations and engagement that leads to innovation.

Given the way communities of color are impacted by crime and violence, and the number of officer- involved shootings and assaults on officers, there is a very real and urgent opportunity for data to be leveraged for their benefit. Right now there are activist groups building databases of all officer- involved incidents and homicides;, these are duplicated efforts costing hundreds of hours of community time from projects such as Oakland’s Shine in Peace to http://killedbypolice.net/ and http://www.fatalencounters.org/. These projects should be taken as a leading indicator of a huge and growing demand for better transparency in our law enforcement agencies – citizens are clamouring for data to inform decision making, policy reform and civic action. When communities across the country need to collect news reports of officer- involved shootings and homicides, we’re missing something. When stop and frisk data are hidden from public view and not available for community research and analysis, we’re missing something. When arrest information only sees sunlight in the form of aggregate yearly reports, we’re missing something. That something will be realized when local law enforcement adopts a policy of open by default and begins publishing (with some obvious legal limitations) record level data of all crime reports, arrests, uses of force and weapon discharges along with stop and frisk incident data.

As individuals we do not trust that which we cannot see. Publishing data alone will not lead to better insights and operations, it is not a silver bullet to restoring community trust in police departments. However, in publishing these data on an ongoing basis, we make possible new, productive collaborations, new opportunities to engage with somewhat objective truths to work from and we allow for innovations that we could not predict. My recommendation to this task force: make open by default the new norm for our police forces, support the open publishing of these data, encourage standard data formats and support these agencies taking a leading role to learn together and to work towards common goals. Toward a future where transparency is no longer a laughable concept when it comes to law enforcement, where communities trust the information coming out of police databases and where residents can see and understand patterns and problems for themselves. Then we can have informed debates and start to remake policing in the USA in ways we agree on.

Data: It’s all about people, not the data

I’m a data geek. I’ll own that. I love what data can do, what it can inform, what it can tell me.  I constantly find myself mentally connecting conversations I’m in and meetings I’m part of to the data that could best inform the discussion or the decisions. It’s a bit of a problem.

As our society and our government becomes slowly absorbed by the data deluge we’re now enabling, there is a righteous backlash from many that data isn’t what it’s all about, data are not more important than say, people. And this is a fair suggestion. Sometimes this is a valid and constructive statement – the point of analysis is not the data, the results or the visualization of those results, it’s what those data can do to inform decisions that will have a human impact that matters.  Where I get frustrated is with people loving to push back on the idea of using data pro-actively is when people argue that “this problem isn’t about data, it’s not something we need data for, we already know what’s happening”. I hate those statements.  They relay a level of arrogance that is not intentional but real.  Anytime someone already fully knows the nuance and scale of a problem, they better also have insights as to the solutions, otherwise what good has their knowledge and insight been to the people they care about helping?

This is another case of two sides acting as if only one side is important. And that is not something productive or effective for most social issues. It’s next to impossible to get executive buy in to change something with just experience and intuition, we don’t often see policy or investment decisions based on insight alone.  Likewise, we should not ever be making serious decisions or assumptions just based on data alone. That leads to decisions made lacking critical context and nuance and to simplistic technocratic solutions. Better to be pairing the data with the insights and experience of those living out those data.

Just as policies are often more successful when developed with the decision makers and implementers involved, so too should data driven decisions be constructed.  A great local example of this in action appeared in the release of our latest report focused on attendance problems in Oakland Unified Schools. Despite serious problems of chronic absenteeism across the district, Garfield Elementary is one of six schools in Oakland that have cut chronic absences by half or more. The Principal, Nima Tahai said “First, it’s data driven. You have to have the numbers in front of you, student names and down to the reasons for each absence… Then, school staff must engage in one-on-one work with families, reaching out to them to find out what is going on and talking to them about the importance of getting their kids to school. He went on to say that Garfield administrators even pick up kids to drive them to school if a family is stuck without transportation or a parent is ill.

This problem would never have been raised to the community’s attention without thoughtful analysis of very detailed data on every student in the district. Data revealed the scale of the problem, and then, in the hands of a facile administrator, were used to identify individual points of influence or action- each student in need of help.  The data alone mean just a nice report or a compliance document. When delivered in a form that can support action, these data become powerful elements of change. Data, people, action. That’s how government should be driving change, data driven, not data obsessed.

*First posted on Govloop.com

Are your data too slow?

Not everything can be Big Data. Not everything should be either. But some data do need a kick in the pants, so to speak. Are the data you produce or use real time, coming down the pipe as a feed everyday, or are you stuck with years old data for your planning and analysis purposes? If you are in the latter, don’t feel bad — you’re not alone.

For those tracking Ebola outbreaks in West Africa, the stream of data is steady but not real time, yet decisions that impact people’s lives are being made every day about resourcing and responding to this crisis.  In the USA there are similarly important data needed — many infections or diseases are notifiable — requiring direct notification of the Centers for Disease Control and Prevention. However regular hospital visits, treatments and surgeries go through a very big, very slow pipeline from local clinics and hospitals up to the state agency level and after processing, refining and some magical treatment, these data flow back to local public health and research agencies some years later. Traditionally this timeline was “all we could do” because of technology limitations and other reasons, but as we rely more and more on access to near real-time data for so many decisions, health data often stands out as a slouch in the race for data driven decisions.

In a different vein, campaign finance data for political donations is sometimes surprisingly fast. In California all donations to campaigns require the filing of a Form 460 declaring who gave the funds, their employer and zipcode. Campaigns are supposed to file these promptly, but this does not always happen until certain filing deadlines. Nevertheless, these data contain valuable insights for voters and for campaigns alike. These data get submitted as a flow, but they then end up in a complex format not accessible to average people — until someone changes that. A volunteer team at OpenOakland created a very powerful automation process that takes these data and reformats them in a way that makes them accessible and understandable to everyone at http://opendisclosure.io. Yet even this system of automated data processing and visualization suffers from a lack of perfectly updated data on a daily basis- the numbers shown each day only reflect the data filed to date, so big donations or changes in patterns do not show up until those are filed — often at a somewhat arbitrary deadline.

Unfortunately not all data are filed frequently and do not come with an easy to use API connection to allow developers and researchers to connect to them directly. Take crime data. Very important information with a high demand for all sorts of decisions at local levels. Your police force may publish good crime data each day or maybe just each month which is useful for real estate people and maybe good for analysts and crime investigations, but how do we know if our local efforts have successfully impacted crime? We go to national data. The Federal Bureau of Investigations (FBI) collects data from most law enforcement agencies in the country and publishes it at as the Uniform Crime Reports (UCR). Unfortunately, these data are published years after the fact. There is a convoluted process for local agencies to format and filter their reports, but then these data take years to get published.

We recently created a violent crime fact sheet using the latest (and recently published) available UCR data — for 2012. This lag in data means that county supervisors and other officials are trying to evaluate the impact of crime prevention efforts but can’t even compare their outcomes with other cities due to the lag in this data – we have to wait for two more years to see if these data indicators  changed in other comparable cities, or if our interventions did have a measurable impact.  This sort of time lag means that no local officials have good comparable data in a reasonable time frame- a poor system for modern policy makers to rely on. The FBI is working to slowly implement a newer system, but it is not clear that the lag will improve.

Every agency responsible for collecting data for operational purposes MUST start thinking about how it can make these data safely available to decision makers and to the public on an expedited process.  The technology is now very accessible to support this, and if necessary we should be considering bifurcated approaches — the old, slow feed to state and federal agencies and a new, agile feed for local use. Privacy standards and quality are simply things that guide how we can do this, they are not actual barriers unless we choose to let them be.

Government is a business, albeit one with a monopoly on services it provides — and it’s not cool for government to be making decisions using years old data when the private sector is increasingly data driven and real time. We can do this!

* First published over at Govloop

Beyond compliance, beyond reports: Data for Action

First posted here.

A week ago the famous Napa region was shaken by a 6.0 scale earthquake resulting in serious damage to buildings, injuries and disruptions in services to a large area. This is something residents in the Bay Area have come to expect and we are all waiting for the next “big one”, overdue in most experts opinion.

The same week, our team launched a new app in response to the disaster.

Oakland is a city with a severe housing shortage, building anger towards gentrification and the unmeasured but very real displacement of low income residents who have called this city home for decades.  It is also home to 1,378 large apartment buildings that are at varying risks of collapse in a quake centered closer to Oakland. The City of Oakland and the Association of Bay Area Governments (ABAG) have studied this issue and over half these buildings have been screened – but over 550 remain to be screened for risk.  Many homes have been found to be safe, while 609 buildings (home to thousands of residents in apartments) have been found to be as serious risk – called potential Soft Story buildings – they have a large open ground level such as storage or garages that will potentially collapse in a quake- rendering those homes uninhabitable – an instant loss of thousands of affordable housing units protected under rent control – any housing units built to replace them will surely not be affordable, resulting in very rapid push out of poorer residents.

So why do we civic hackers care about this? It’s a matter of equity and a matter of many residents without good access to information relevant to their living situation- without information, no-one can act. Unfortunately, the common practice in government is to collect information and store it neatly in a report that floats around city hall as a PDF. The data live on a server somewhere, called on only when needed. We greatly respect the proactive work the City and ABAG have done in the screening efforts, however there remains a large number of homes unscreened and there are still thousands of renters with no idea of their risk- either through damage and injury or through displacement after the quake- as a result of rent increases applied by landlords passing on retrofitting costs – Oakland’s rent control policy sadly does not clarify whether seismic retrofitting costs are borne by the landlord or tenant or both.

Some months ago we convinced ABAG and the City of Oakland to publish the data from these surveys – a complicated inventory because of the changing status of buildings as they are screened and retrofitted.  We had been planning to build a new app that would raise awareness of this issue to spur action – both for tenant rights groups and for the city to determine a policy for handling these costs and for ensuring homes in Oakland are safe for residents. After the quake we realised it was an important moment to raise this issue – so we sprinted to release a new app that helps renters and homeowners see the status of their building: http://softstory.openoakland.org.  

Our approach is to build tools that puts information in the hands of the public in a way they can act on it. In this case, the formal report is a good document, but it serves policy makers only, it does not inform nor empower those living in these homes.  This simple app lets any resident see how their building is rated – as exempt and not a soft story building, as retrofitted and safe or as potentially soft-story and at risk in a big quake.  

We’ve advocated for open data with local governments for this very reason (and others) – data can be used to fill up reports with snippets and summaries that help decision makers, but there should be a default to open with all data that has no legal reason to be protected – this information, in the hands of those actually affected by it can do radically more than if it were still sitting on a government hard drive somewhere in city hall!

Getting Quake Safe in Oakland

Our new app helps Oaklanders get Earthquake Safe!

Oakland has almost 610 homes at risk of collapse or serious damage in the next earthquake we will experience! These homes, called “soft-story” buildings are multi-unit, wood-frame, residential buildings with a first story that lacks adequate strength or stiffness to prevent leaning or collapse in an earthquake. These buildings pose a safety risk to tenants and occupants, a financial risk to owners and risk to the recovery of the City and region.

Today we are launching a new OpenOakland web and mobile app that will help inform and prepare Oakland renters and homeowners living in these buildings at risk of earthquake damage. The new app: SoftStory.OpenOakland.org provides Oaklanders with a fast, easy way to see if their home has been evaluated as a potential Soft Story building and is at increased risk in an earthquake.

The stats:

609: multi-unit buildings assessed to be at risk in Oakland

238: buildings have been assessed and found to have been retrofitted or to be exempt.

531: buildings are soft story types but have not had complete screenings yet.

1378: total soft story buildings in Oakland


 

This new app relies on data from a screening of potential soft story buildings undertaken by the City of Oakland and data analyzed by the Association of Bay Area Governments (ABAG). Owners and renters can see if their home is is considered at risk or has already been retrofitted, and learn about the risks to soft-story buildings in a serious earthquake, an event that is once again on people’s minds after last week’s magnitude 6.0 earthquake in Napa.


 

We’re launching this app as a prototype with short notice as we believe this information is critical for Oaklanders at this time.  The app was built with the support of ABAG and the City of Oakland and had technical support provided by Code for America.  Once again, where local government is increasingly transparent, where data is open and in open formats, our community can build new tools to help inform and empower residents.

To see if your building is at risk visit:SoftStory.OpenOakland.org

Oakland gets its crime data feed on

You may not have heard about this yet, which is a shame.  It’s a shame because it’s a rare good thing in local government tech, because it’s a serious milestone for our city hall and because our local government isn’t yet facile with telling our community about the awesome things that do happen in Oakland’s city hall. But I’m excited, and I’m impressed that we’ve gotten here- Oakland’s crime report data is now being published daily, automatically, to the web, freely available for all.

This is quite a leap forward to where we were several years ago and in fact just year ago to be honest: spreadsheet hell. Often photocopied spreadsheet hell. Things happen slowly, but some things we’ve pushed for because we recognize their potential to change the game forever. First we pushed for opendata as a policy in the city, and we got there quick enough, but we’re now waiting in expectation for our new CIO Bryan Sastokas to publish the city’s very crucial open data implementation plan.  Then we started pushing public records into the open with the excellent RecordTrac app that makes all public information requests public unless related to sensitive crime info. And now with local developers soaking up the public data available we have the first ever Oakland crime feed and an API to boot!

The API isn’t actually something the city built, it’s a significant side benefit of their Socrata data platform- just publish data in close to real time and their system will give you a tidy API to make it discoverable and connectable.You can access their API here:

http://data.oaklandnet.com/resource/ym6k-rx7a.json

If you’re more of an analyst or a journo or a citizen scientist you may want the data in bulk, which you can grab here.  That link will get you to a file that is updated on a daily basis- pretty rad huh. Given how crime reports tend to trickle in- some get reported days after, some months, some get adjusted, the data will change over time- the only way to build a complete, perfect dataset is to constantly review for changes, update, replace etc- a very complicated task.  If you want a bulk chunk of data covering multiple years, with many richer fields and much higher quality geocoding you can grab what we’ve published at http://newdata.openoakland.org/dataset/crime-reports (covers 2003 to 2013) and for the more recent historic version you can use this file: http://newdata.openoakland.org/dataset/oakland-crime-data-2007-2014

Now that we’ve figured out how to pump crime report data out of the city firewall, we can get to work connecting it to oakland.crimespotting.org and building dashboards to support community groups, city staff and more!

So thank you to the city staff who worked to get this done- now let’s get hacking!

Side bar- Oakland has finally gotten hold of it’s new Twitter handle: @Oakland is now online! More progress…

Oakland’s Shot Spotter action in 2013

Given OPD’s recent suggestion that they want to ditch the Shot Spotter system and given the data are available, it seems worthwhile to start digging into the data to see what use they may have, starting with public benefits.  This map is a really simple visualization of the shots from January  to October of 2013.  At city level it becomes a mess, but at neighborhood level it is far more revealing.  Data in web friendly formats are available here also.

You can view it fullscreen here.

http://spjika.cartodb.com/viz/bd291444-b43b-11e3-83bf-0e10bcd91c2b/embed_map?title=true&description=true&search=false&shareable=true&cartodb_logo=true&layer_selector=false&legends=false&scrollwheel=true&fullscreen=true&sublayer_options=1&sql=SELECT%20*%20FROM%20oakshots%20where%20date_tim%20between%20

To see the areas of the city formally covered by this system use these [ugly] maps.

Already Beyond the Data Portal

I was inspired by a recent piece by the wonderful @jedsundwall on his gov3.0 blog about the need to be going beyond data portals (much like a recent book I contributed too focuses on BeyondTransparency, shameless plug yes).

Jed totally hits it with this assessment of a growing attitude in local government towards just getting the data out:

It’s time to acknowledge that data is not made useful simply by making it available online. As we work to make data open and available, we also need to train people who can help make it accessible and useful.

In cities locally and globally the concept of open data is being pitched by vendors as a simple, turnkey thing they purchase and simply check it off their list of good government tasks.  Not enough cities have realized that this huge data resource is an amazingly underutilized and under-leveraged resource for them. In Oakland, so much of the data being published leaves much to be desired and leads to dozens of new questions about the source, quality, meaning and completeness of these data, but the city isn’t really embracing this as a way to engage the community and to see these data reach more of their potential.

Jed goes on to suggest an alternative reality where data support exists side by side with the data portals:

You’re doing your research, but you’ve heard of the San Diego Regional Data Library. You go to its website and see that you can email, call, or chat online with a data librarian who can help you find the information you need. You call the library and speak with a librarian who tells you that the data you need is provided by the county rather than the city. You also learn about datasets available from California’s Department of Transportation, a non-profit called BikeSD, Data.gov and some other data from the city that hasn’t been opened up yet.

This is where my two worlds collide. The #opendata & #opengov world is leading and pushing from a certain position, mostly not connected to the existing community research, indicator and data world and the community indicators world has been slow in embracing this brave new world of easy access to data.  We need to get along, to understand each others positions and intentions and we can really make this #datadriven world matter for our communities.

The concept of a data library is very similar to what groups like Urban Strategies Council have been doing for 15 years with our InfoAlamedaCounty.org project.  For a long time we’ve seen the need to provide communities with reliable research and data to drive action and we’ve struggled to get access to data for this entire time. 

We formed the National Neighborhood Indicators Partnership in 1992 with the Urban Institute to help build the field nationally to help empower more communities in just this way- we have a mandate to publish data and make usable, actionable information to communities equally. Our partner organizations in 37 cities have local knowledge and regional knowledge, expertise in community development, reentry, public safety, economic development, education and health, so we’re able to not just provide raw and improved data, we’re able to be an active, responsive voice in or communities to make more data actionable.

Many NNIP partners are starting to embrace the open data world and this is a powerful recipe for a data driven future that is focused on equity in our cities- most NNIP partners have a social mission as opposed to just doing data in a cold, calculated way.  But the unfortunate truth is that as our cities are becoming more data rich, many NNIP partners are facing declining funding to help support community uses of data.  It would be a mistake for funders to largely lose interest in community data intermediaries (not a sexy concept) in the excitement over open data, because none of these data become actionable and meaningful without serious support, engagement and use.

The data library is a great concept, and our experience in Oakland and many other cities says there’s huge need and value for such an entity.  Our cities can themselves play some part by being more engaged through their open data resources, but that’s never going to be enough, just like Chicago has fantastic staff who engage, there’s still a role for the Smart Chicago Collaborative effort to bring that power out to communities across the city.

More data, more engagement, more power to the people?