How to manage data and move it from one point to another will be a major challenge for the U.S. government.
Szykman also cites five other key issues that the Commerce Department encounters with big data:
Authenticity of the data
The importance of big data is not only the records it generates, but also the "replicability" of the scientific results it produces. The importance of Big Data is not only in the records it generates, but also in the "replicability" of the scientific results it produces. At the academic level, this is where you can prove the value of your work: others can replicate the results. On the other hand, if you lose the data from which the results were derived, this reduces the validity of the results.
Data engineers
Many scientists in the research field are looking at sophisticated uses of big data, such as how genetic data is being developed in the fields of preventive medicine, drug design, and fetal testing. But Szykman's concern is that too few people really understand the technical architecture of big data. We need to think about big data and how we can utilize it, especially in specific areas. Whether it's direct government applications or government-funded research, governments are pushing the boundaries of big data.
Think big, plan early
Figuring out system lifecycle requirements as early as possible is increasingly important in the move to open data. One thing that hasn't been done in the past is to look at open data requirements on the lifecycle as early as possible. Data modeling, sharing, and information will become more prevalent, and systematic strategies will become more common. This should be considered early in the lifecycle when we have successfully installed a new system or application.
Confidentiality vs. Integrity
For those organizations with a research base, big data security is more than just a confidentiality issue. The long-term integrity of the data is also a bigger concern for organizations. It's a topic that the IT community has struggled with. Sometimes we focus too much on results at the expense of security. People sometimes ask, 'We're all going to end up sharing this data with the public, so what's so important about security?'
The best answer to this question comes from scientific organizations such as NOAA, which collects baseline data at a time when U.S. climate change policies are highly controversial. Regardless of the political leanings of these policies, they have a significant impact on the economy. If we abandon the security of these long term climate record data, there will be serious consequences. We do have to think about big data.
Establishing a baseline
It is sometimes difficult to rate the spending and risks of big data, and other high-tech projects, because similar applications rarely exist and it is difficult to obtain information or make comparisons. Coming up with a baseline for spend and risk is a big challenge for both big data and data centers because there are no standards yet. It's sometimes challenging to do simple things, such as calculating the energy consumption of a data center. Big data baselines require better planning for future resources not only at the infrastructure level, but also for data packages.