One of the things we’re excited by here at Inquiron in the increasing availability of open data. For the uninitiated, this is non-personal data that is reusable without restrictions and is often free to access and use. Increasingly we are seeing publicly-funded organisations being required to provide access to data. Some commercial organisations are also opting to release their data, either for the public good, to spur innovation, or some combination of the two. Individuals, startups and data science teams can all help make sense of the data, or take the lead in analysing it and using it in new ways.

One of the data sets we’ve been looking at recently is the Companies House Free Accounts Data Product. This is a monthly snapshot of all companies currently registered in the UK, which is approximately 3.2 million.

We’ll be writing more about this data in the future, but one of the first things that jumps out at you when looking at the data, as shown in the figure above, is that fact that the vast majority of newly incorporated companies are unclassified – ‘None Supplied’ – according to the Standard Industrial Classification of Economic Activities (SIC) scheme. The SIC code describes in some detail the sector in which each company operates. This is because currently companies are not required to classify themselves until they submit their first annual return.

However, Companies House is currently running a consultation on a possible simplification of company filing requirements, which closes on 22 November – thanks for the heads up Chris Taggart. One of the questions posed is; ‘Do you agree that the SIC code should be required at incorporation and maintained as part of an annual check?’

Our answer is yes. Having more information about companies would very likely bring a range of benefits stemming from increased transparency. Several London-based startups including OpenCorporates and DueDil are already building businesses around this type of corporate data. Their idea is that increasing access to this data and combining it with other data will be useful for a wide range of activity, including corporate fraud detection and prevention, marketing, business and consumer intelligence, and broadly speaking trust building in the business community.

The Companies House data is certainly one of the open data sets we’ll be using as part of our data science and predictive analytics services. The open data movement is gathering pace and provides us data scientists with an exciting opportunity to help organizations improve data-driven decision making.

Figure caption: Here we show the the total number of company records in the Companies House register, from 2010-2013, segmented by the month in which the company was incorporated. The different colours represent different SIC codes, about 435 in use. We can see that most companies incorporated in the last year are unclassified. The data source is Companies House Free Company Data Product, September 2013.