[examples/openspending] - openspending v0.2 (#907)

* [examples/openspending] - openspending v0.2

* [examples/openspending][m] - fix build

* [examples/openspending][xs] - fix build

* [examples/openspending][xs] - add prebuild step

* [examples/openspending][m] - fix requested by demenech

* [examples/openspending][sm] - remove links + fix bug
This commit is contained in:
Luccas Mateus
2023-05-30 20:22:58 -03:00
committed by GitHub
parent cb7d801968
commit 14974edcbf
474 changed files with 25289 additions and 116 deletions

View File

@@ -0,0 +1,79 @@
---
lead: true
title: Hutspace
authors:
- Neil Ashton
---
<div class="well">Project: scraping, analysing and publishing procurement data. In progress, with version 1.0 scheduled for publication June, 2013.</div>
<em>This chapter is based on an interview with Emmanuel Okyere, Hutspace (Ghana).</em>
Emmanuel Okyere is running a project to scrape, publish, and
analyse the Ghana procurement register, while working on his own
IT startup.
Emmanuel has built a database of contract awards for Ghana using Python
scrapers and parsers, [Celery](http://www.celeryproject.org), and PostgresSQL. The
preliminary result shows 4000 contract awards and 2000 current
procurement opportunities. Future plans include building a searchable
database and a flat CSV file download option in order to enable
journalists and CSOs to work with the data.
## Technical challenges
### Cleaning data
> “Cleaning the data has been a substantial issue for the
> project. Theres a lot of validation, which we need to do before we can
> publish it simply because the data appear to be inaccurate. As an
> example, we might have unrealistically small amounts appearing in a
> contract award, or a date might not make sense. This has been the most
> substantial bottleneck for the project.”
### Reconciling company entities
> “Many companies appear with a variety of
> entities, and so finding a good way to reconcile companies which are
> actually the same has been difficult.”
Emmanuel is planning to utilize a helpful codebase from the open
parliament field, originally developed by MySociety for name
reconciliation of parliament members, for reconciling the company
entities.
### Identifying the correct amounts
A surprising problem in the procurement
data has turned out to be the varying currency denominations appearing,
such as GBP and USD. Finding appropriate historical exchange rates and
calculating these has been cumbersome, but it is important in
order to make the data as accessible as possible.
## Community challenges
Emmanuel points out that both the lack of knowledge about the
availability of procurement data as well as the lack of skills to
analyse it among journalists and CSOs are the main barriers for
achieving more usage of the data.
> “For much of the work to be done on the data, having skills to use Excel
> would actually be sufficient for journalists in order to get to work
> with the data. However, skills to use Excel for analysis are lacking
> among almost all journalists today. When it comes to more challenging
> tasks which require coding skills for analysing the data, I know
> actually only one journalist. She will be involved in this project.
> “Trainings could help equip more journalists to work with the
> procurement data we are planning to release. We really need more people
> to look and use the data, but that require that they have the skills. I
> think that is what trainings like data bootcamps are for.
> “As publishers of the database, we would like to build visualisations to
> spot trends in the data. For instance, we have noticed that when new
> governments get into power, we see this reflected in the procurement
> data as new contractors appear while others vanish. This is analytical
> work we can do which I think journalists will not be able to do on
> their own.”
**Next**: [Texty](../texty/)
**Up**: [Case Studies: Procurements](../)

View File

@@ -0,0 +1,23 @@
---
lead: true
title: 'Case Studies: Procurements'
authors:
- Neil Ashton
---
<a href="http://www.flickr.com/photos/seemoredomore/4710878501/" title="Construction in Ghana by Twin Work &amp; Volunteer"><img src="http://farm5.staticflickr.com/4072/4710878501_eb22b37418_z.jpg" width="640" height="480" alt="ConstructionsGhana (7)"></a>
An important category of government spending data is data on public procurements. Procurement data concerns works, services, and goods commissioned by public authorities. The tight regulations that generally apply to public procurements create an excellent opportunity for data publication and reuse.
In this section, we look at three CSO projects that have made use of procurement data, exploring what value these CSOs have found in the data, what challenges they've faced, and what tools they have used to address those challenges.
We have found that procurement data serves an important purpose for promoting financial transparency in many countries. In particular, it is often able to fill in the blanks when [transactional spending data](../case-studies-spending/) is not available. At the national level, most EU countries do not publish transactional spending data, with the exceptions of the United Kingdom and Slovenia; nor in general do public agencies outside national government such as regional or municipal government, despite a sizeable share of government spending taking place at these levels.
Global initiatives such as OpenContracting of the World Bank Institute and more recently the procurement initiative of the Sunlight Foundation confirm that momentum is growing to promote transparency in procurement. The case studies in this section show that accessing and analysing procurement data can provide substantial improvements to the state of financial transparency, but also that data formats, data quality, and disclosure policies remain barriers for utilizing the full potential from procurement data. These issues deserve attention as procurement transparency gains momentum.
* [Hutspace, Ghana](./hutspace/)
* [Texty, Ukraine](./texty/)
* [OpenTED, procurements from EU](./opented/)
**Next**: [Hutspace](./hutspace/)
**Up**: [Mapping the Open Spending Data Community](../)

View File

@@ -0,0 +1,27 @@
---
lead: true
title: OpenTED, Opening Tender Electronic Daily
authors:
- Neil Ashton
---
This post reviews how OpenTED and OpenSpending have worked to make procurement data from the EU site, Tender Electronic Daily (TED), available as a CSV download. More than 100,000 public sector contracts are published annually in the European procurement register originating from tiny municipalities to large government agencies.
## Why open up EU procurement data?
TED contains procurement data on contracts awarded from any public agency within the EU valued above the minimum threshold of EUR 200,000. In most EU countries, granular data from contract awards therefore comprises a significant share of procured and projected spending.
It is an often overlooked fact that EU procurement rules apply to all majority publicly owned companies. For this reason, the public can, for instance, access data on more than 500 contracts awarded by the Swedish state-owned Vattenfall in all EU countries of operation, such as a contract awarded from their Berlin based company, due to the fact that it is majority owned by the Swedish state.
## Project and issues
Data from TED is not available as a bulk download, and so in 2011, a small data journalism project, OpenTED, began exploring the options for scraping the data in order to make it openly available. In November 2012 and May 2013, this was explored further through community hack days in London and Brussels organised by OKF, where data was retrieved, parsed, and cleaned. The full TED data is now available as CSV files EU-wide, on a country-by-country basis, and by annual breakdown.
## Challenges
Several challenges remain, which are primarily tied to the data quality. Additional data cleaning is still needed before it is even possible to assess to what extent the TED data actually contains sufficiently useful information.
A review of data quality is needed. Preliminary findings have shown that significant data fields such as contract amounts and contractor name suffer from low reporting due to what could be an absence of mandatory reporting requirements. The community involved in advancing procurement transparency, such as Transparency International and Sunlight Foundation, should examine how disclosure practices can be improved. The data quality review of TED is an example of a dataset which can only be improved if the transparency communities across countries join forces to argue for such improvements.
**Next**: [Case Studies: From Local to Global](../../case-studies-other/)
**Up**: [Case Studies: Procurements](../)

View File

@@ -0,0 +1,135 @@
---
lead: true
title: Texty
authors:
- Neil Ashton
---
<div class="well">Project: <a href="http://z.texty.org.ua/">z.texty.org.ua</a>, a procurement database based on data from the Ukrainian government.</div>
Texty.ua was established in 2010 as an NGO by Anatoliy Bondarenko and
Roman Kulchynsky (Editor in chief). They both have a background inside
Ukrainian media outlets, Roman as Editor in Chief at the Ukrainian
weekly, Tyzhden. Anatoliy has served as an editor and programmer with
a scientific educational background.
Texty decided to pursue procurements, as this proved to be the
best possible way to cover public spending due to the fact that
transactional spending is not available. The result was <a href="http://z.texty.org.ua/">z.texty.org.ua</a>, a
searchable database for public procurements completed in the spring
of 2012. The database is updated weekly and contains procurement data
from 2008 onwards.
State and local budgets also remain priorities for Texty, though they
do not currently have the resources to conduct analysis more frequently than
once a year. The state budget process in Ukraine is complex
and difficult to follow, so the site is currently monitoring changes
to the budget, and Texty would like to play a role in this.
## Tools
Texty work on budget and procurement data with a variety of tools.
* Open Refine: working with raw data
* R: analysis of data
* D3.js: online data visualization
## Model
Texty sustains its activities by providing data analysis and
visualisations for both CSOs and media outlets.
They delivered [data
analysis for Forbes Ukraine](http://forbes.ua/ratings/people) concerning
concentration in procurement contracts within the business elite.
## Challenges
Texty points to the lack of resources in the data journalism
field as the biggest challenge. While both data and tools are available,
the lack of resources for completing the required data analysis
currently hinders more elaborate projects on spending transparency.
While CSOs and media outlets regularly source data investigations with
Texty, the demand is currently not enough for taking advantage of the
data actually available. Texty is supplementing their investigations
with offering data-journalism trainings.
### Open database for public procurements in Ukraine
In 2011, when Texty began working on public procurements in Ukraine,
getting the data was a top priority because of the huge volumes
available and rumors about massive corruption in the field. In
2012, spending on procurements was approaching 40% of the GDP of Ukraine, which
could be one of the highest in the world.
### Problems with the govermental site
[http://tender.me.gov.ua](http://tender.me.gov.ua), the source of procurement data, presents several issues. It requires an account and login, and it only gives access to the
data via an HTML table with max 100 results from one of the issues of the
official bulletin about public procurements. No tables are sortable, and
no records have been linked to one other. Finally and most
importantly, the data is dirty; you can, for example, easily find several different
versions of the same supplier (company) name.
## Getting data from the government site
The Texty team wrote a Ruby script to mimic user login, check for
updates, and to scrape data from HTML webpages, all of which had a
different structure. After cleaning, they imported the data into a relational
database as normalised data, for example creating links between records
for each participant. The database is updated approximately twice per
week.
The tool stack:
* [nginx](http://wiki.nginx.org/Main)
* [sinatra](http://www.sinatrarb.com/)
* [mysql](http://www.mysql.com/)
* [Tangle.js](http://worrydream.com/Tangle/) (for a novel approach to the user interface)
## Features
From the main page, it is possible to explore data about tenders in realtime and to change the textual query and immediately get information on the total volume for a particular industry, participant, and/or period of time.
Additionally, clicking on total volume yields all tenders therein. For each company participating in a tender, the database contains information on all other deals which the company has won. Recently, an "advanced search" page has been added, with the possibility to export result in form of a simple and portable CSV format
## Impact and coverage of the project
One year into the project's existence, the site reached about 1,500 daily
users per day, despite having almost zero advertising. It has gained
attention and been used by investigative journalists as well. Some
stories were published in the biggest independent
internet outlet, Ukrainian Pravda, which has approximately 200,000 readers per day.
In Autumn 2012, a joint project with Forbes.ua called "Champions of
tenders" was launched. The Texty team shared the open part of their data, information about
deals from their database (including the names of firms and volumes of money),
through a simple web API. Next, the team from Forbes.ua used the data in
their database to link firms to names of owners—Forbes.ua mantains a
proprietary database of these. The Texty team also made an [interactive
visualization of this data](http://forbes.ua/ratings/people) for Forbes.ua.
<a href="http://www.flickr.com/photos/94746900@N06/8895650387/" title="thumbnail by anderspedersenOKF, on Flickr"><img src="https://farm9.staticflickr.com/8123/8895650387_c1f6582979_o.jpg" width="600" height="373" alt="thumbnail"></a>
## Impact of open tender data
Since 2008, when information about tenders became openly available for
the first time, there has been a shift in public opinion about
tenders and public spending on procurement. Today there seems to be a
real awareness about corruption in procurements, though still not a
clear idea about the actual scale of the problem. For example, there is
even a TV-programme on the channel TVi, opposing the government, called
"Tenders News".
Ukraine has a couple of projects about tenders, though Texty appears to
be the most sizeable and complete database. There has, however, been continuing lobby attempts to close down access to
as much information about tenders as possible, and many of these have
unfortunately been successful. The most recent example was a law accepted by a
majority of the Ukrainian parliament in Autumn 2012, which meant that 35% of
all volumes of tenders would be hidden from the public.
The ongoing hope for transparency in public procurement is based on a
proposed agreement about association between Ukraine and the EU, which
includes requirements about transparency in tenders.
**Next**: [OpenTED, Opening Tender Electronic Daily](../opented/)
**Up**: [Case Studies: Procurements](../)