Security and research information is a hot topic right now for a number of reasons, not least the transparency agenda and the desire, quite rightly, to be as open as possible about data relating to the conditions and outcomes of clinical trials. However the information security of research data more generally has always been an area that is difficult to quantify. Go back to the excellent ‘The Cuckoo’s Egg’ by Cliff Stoll and there is debate regarding the perceived value of research data:
“… our data was either worth nothing or zillions of dollars. How much is the structure of an enzyme worth? What’s the value of a high temperature super conductor? The FBI thought in terms of bank embezzlement; we lived in a world of research.”
Clinical research is big business. Nations and companies compete to be first to deliver new drugs and treatments to patents. This means the security of clinical research data is under pressure, particularly when juxtaposed with the transparency agenda and our desire to open data up as much as we can to facilitate improved performance through insight derived from information.
But what is open data? The comedy answer is that it is data that is open! But it is as simple as that for our organisation and, I believe, for clinical research data generally. We need to transform the power of open data to make it useful. We need to make open data valuable, to do this we need to show the value that open data creates.
Open data, to our organisation is:
“…data that can be made use of by anyone within the organisation through analysis, linkages, and evidence based delivery.”
So, many will say not really open at all then, as we are limiting access to the detailed elements of information to within the clinical research organisation. But, even that is far more open than we have been able to orchestrate in the past and is a starting point for how we begin to open up appropriate information.
Compared to two years ago we have made great strides forward in our ability to be more transparent. For example, apps to enable the public to see what research is underway in the NHS and systems to enable the life sciences industry to track their study throughout the NHS.
The definition of protecting open data stretches beyond the definition of information security. The easiest way to understand how to protect the open data is to break down the principles of what impacts on it:
Content: The what – Information relating to the performance of a clinical trial in the NHS, the resources it uses, the resource it requires and the content of the trial.
Scope: The why – To enable insight from data through the linkage to other data and through the exposure, via business intelligence tools that enable information to be delivered to decision makers.
Policing: The rules – Data relating to competing industry partners shall not be made available other than in aggregate form. Data linkages to other data sources both within the organisation and open data sources can be done by those within the organisation.
Stakeholders: The who – Only bona fide individuals from within the organisation shall have access to the full data set. Decisions to open up further will be taken in conjunction with all stakeholders.
Lifespan: The when – As near real time as possible and clearly identified at the point in time it is from.
Now these key principles are known, defined and agreed across the business it becomes much easier to then create matching key principles for securing and governing the open data:
Managing the re-use of data. This becomes more important with open data. As the authoring organisation, there is a need to know, and to some degree control, where data is re-used, particularly where data linkages are possible. The implications of reputational damage from the reuse of data need to be managed and the owner or author of open data always needs to be maintained so that data can be traced back to the originator.
Corporate responsibility for the delivery of open data and the governance of it is done through the policing of a code of conduct for the use of data. Enforcing outcomes due to the result of non-compliance becomes a corporate responsibility. The outcome though, needs to be commensurate with the non-compliance, so the removal of a licence to use the open data could be a balanced measure to misuse.
Triggers for the review of the open nature of the data also need to be in place. An external review of data, maybe even an organisation-wide audit can enhance the definition of open data and the trust in this. As an organisation the NHS has an Information Governance review each year, which we now comply with as an organisation and as we further enhance our definition of open data we will audit our own open data policies and procedures.
An area of great care that has to be considered is ‘small numbers’. Can linking open data in small numbers break confidentiality and expose identifiers that should remain secure? A policy on the opening up of data relating to ‘small numbers’ has to be created and adhered to.
The veracity of open data, the speed at which open data is created and linked to other data sources needs to be managed and governed. The data that we open up has quickly become a big data set, which requires additional policies and procedures to protect it.
The design of open data security can make or break its implementation. Even the word ‘open’ can cause some issues and certainly there is a nervousness around the concepts of opening up data in any industry, which is why great care needs to be taken to demonstrate the business benefits and clearly communicate the checks and measures in place.
For our organisation the business benefits of a secured open data solution are:
Create the ability for “crowd source analysis” of big data
Data linkage between other open data, enabling new insight
Transparency for stakeholders
Data re-use to create an information eco system
Improved data quality through the exposure of data
By releasing these benefits the value of open data becomes apparent and the checks and measures in place allow a wider audience to be considered for access to the data. Open data needs to be actionable but also beautiful and simple. Open data creates the power to disrupt, improve and make the world a better place and makes research easier to complete, more quickly, more successfully and at a lower cost.
An expert in the field recently said to me that he was fond of a clinical research ‘reverse adage’, “A month in the lab can save 30 minutes in the library,” he had a recent version of this he shared, “A years worth of a clinical trial can save a day analysing data!”.