Current location - Loan Platform Complete Network - Big data management - An In-Depth Analysis of the Revolutionary Prospects of Big Data
An In-Depth Analysis of the Revolutionary Prospects of Big Data

Revolutionary Prospects for Deeply Analyzing Big Data

"Big data" is the current buzzword, the all-purpose solution used by the technology community to solve the world's toughest problems. The term is generally used to describe the skill and science of analyzing massive amounts of information to discover patterns, glean valuable insights, and predict answers to complex questions. It may sound a bit tedious, but from stopping terrorists, to eradicating poverty, to saving the planet, there's no problem that can't be solved for advocates of big data.

Victor? Meyer Sch?nberg and Kenneth? Choukir, in their book with the austere title "Big Data: a revolution that will change the way we live, work, and think," cheer that "the benefits to society will be endless, as big data will go some way to solving looming global problems such as dealing with climate change, eradicating disease, and fostering good governance and economic development. "

With enough data to work with - whether it's data on your iPhone, grocery store shopping status, online dating site personal profiles, or anonymized health records for an entire country - and utilizing the computational power to decode that raw data, one can gain countless valuable insights. Even the Obama administration has caught on, and on May 9 released an "unprecedented" amount of "previously inaccessible or unmanageable data" to entrepreneurs, researchers and the public.

But is big data really all it's cracked up to be? Can we trust that the many 1s and 0s will reveal the hidden world of human behavior? Here are the authors' musings on so-called big data theory.

1. "With enough data, numbers can speak for themselves"

No way. Advocates of big data want us to believe that behind lines of code and vast databases exist objective, universally valuable insights about patterns of human behavior, whether it's consumer spending patterns, criminal or terrorist actions, health habits, or employee productivity. But many big data evangelists are unwilling to confront its shortcomings.

Numbers can't speak for themselves, and datasets -- no matter what size they are -- are still the product of human design. The tools of big data -- such as the ApacheHadoop software framework -- don't free us from misinterpretation, compartmentalization, and false stereotypes.

These factors become especially important when big data tries to reflect the socialized world we live in, and we're often fooled into thinking that these results are always more objective than human opinion. Bias and blind spots exist in big data just as they exist in personal feelings and experiences. But there is a questionable credo that says bigger data is always better, and that correlation equals causation.

Social media, for example, is a pervasive source of information for big data analytics, and there is undoubtedly a lot of information to be mined there. We're told that data from Twitter shows that people are happier the farther away from home they are and most depressed on Thursday nights. But there are many reasons to question the meaning of this data. For one thing, we're told by the Pew Research Center that only 16 percent of U.S. adults with Internet access use Twitter, so they're by no means a representative sample -- they're disproportionately young and urban compared to the population as a whole.

Additionally, we know that many Twitter accounts are automated programs called "bots," fake accounts, or "quasi-bot" systems (i.e., human-controlled accounts assisted by a bot program). Recent estimates suggest that there may be as many as 20 million fake accounts. So even before we step into the methodological minefield of how to assess the sentiment of Twitter users, let's ask whether that sentiment is coming from real people or automated algorithms.

2. "Big data will make our cities smarter and more efficient"

To a certain extent, yes. Big data can provide valuable insights to help improve our cities, but that's about all it can do for us. Because data is not all created or collected equally, big data sets suffer from a "signaling problem"-that is, certain populations and communities are ignored or underrepresented, which is known as the data dark zone or shadow zone. This is known as the data dark zone or shadow zone. So the use of big data in urban planning depends heavily on municipal officials' understanding of the data and its limitations.

Boston's StreetBump app, for example, is one of the smarter ways to gather information at low cost. The program collects data from the smartphones of motorists who drive over potholes in the road. More apps like it are on the horizon. But if cities start relying on information from smartphone users alone, those citizens are just a self-selecting sample -- and it will inevitably lead to a lack of data from neighborhoods with fewer smartphone users, a population that often includes older and less affluent citizens.

Despite several efforts by Boston's new Office of Urban Mechanics to remedy these potential data deficiencies, less responsible public ****officials may miss these remedies and end up with unbalanced data that further exacerbate already existing social injustices. One need only look back to the 2012 "Google Flu Trends," which overestimated annual flu rates, to recognize the impact that reliance on flawed big data can have on public **** services and public **** policy.

"Open government" programs that make government data available online - such as the Data.gov website and the "White House Open Government Initiative" - also have problems. More data will not necessarily improve any government function, including transparency and accountability, unless there are mechanisms to keep the public and public *** agencies engaged, not to mention facilitating the government's ability to interpret the data and respond with adequate resources. All of this is no easy task. The fact is, we don't have a lot of highly skilled data scientists around yet. Universities are now scrambling to define the field, develop tutorials, and meet market demand.

3. "Big data doesn't discriminate between different social groups"

Almost never. Another expectation of Big Data's claimed objectivity is that there will be less discrimination against minorities, since raw data is always free of social bias, which allows analysis to be done at the level of the whole, thus avoiding group-based discrimination. However, because big data are capable of making assertions about how groups behave differently, they are often used for precisely one purpose - namely, to categorize different individuals into different groups. For example, a recent paper alleges that scientists have allowed their own racial biases to influence big data studies about genomes.

Big data has the potential to be used for price discrimination, raising serious civil rights concerns. The practice has historically been known as "redlining". Most recently, a University of Cambridge big data study of 58,000 Facebook "likes" was used to predict extremely sensitive personal information about users, such as sexual orientation, race, religious and political views, personality traits, intelligence level, happiness, addictive substance use, parental marital status, age and gender.

Journalist Tom? Folmski said of the study, "This kind of easily accessible, highly sensitive information can be used by employers, landlords, government departments, educational institutions and private organizations to discriminate against and punish individuals. And people have no means of fighting it."

Finally, consider the impact on law enforcement. From Washington, D.C., to New Castle County, Delaware, police are turning to big data's "predictive policing" models in hopes of shedding light on unsolved crimes and even helping to prevent future crimes. But focusing police efforts on specific "hot spots" identified by big data runs the risk of reinforcing police suspicion of disreputable social groups and institutionalizing differential enforcement.

As one police commissioner has written, while predictive police registration systems do not take into account factors such as race and gender, the practical results of using such systems without consideration of disparate impact can "lead to a deterioration of police-community relations, a public perception of a lack of judicial process, allegations of racial discrimination, and threats to the legitimacy of the police. legitimacy at risk."

4. "Big data is anonymous, so it doesn't invade our privacy"

Big mistake. Despite the best efforts of many providers of big data to eliminate the identity of individuals in datasets targeted at humans, the risk of re-identification remains high. Cellular phone data may seem fairly anonymous, but a recent study of a dataset of 1.5 million cell phone subscribers in Europe showed that just four reference factors were enough to identify 95 percent of them one by one.

The researchers point out that there is uniqueness in the paths people take in cities, and given how much information can be inferred using large public **** datasets, this makes personal privacy a "growing concern".

But the privacy concerns of big data go far beyond the usual identification risks. Medical data currently being sold to analytics companies could be used to track down your identity. There's a lot of talk about personalized medicine, and the hope is that in the future drugs and other therapies can be developed for individuals as if they were made using the patient's own DNA.

That's a wonderful prospect in terms of improving the efficacy of medicine, but it essentially relies on identifying individuals at the molecular and genetic level, and that kind of information carries a lot of risk if it's used inappropriately or compromised. Despite the rapid growth of personal health data collection apps like RunKeeper and Nike+, the use of big data to improve healthcare in practice is still more of an aspiration than a reality.

Highly personal big data sets will be a prime target for hackers or leakers to covet. WikiLeaks has been at the center of several of the worst big data leaks in recent years. As we saw with the massive data breach of the UK's offshore financial industry, like everyone else, the personal information of the richest 1% of the world's population is highly vulnerable to public disclosure.

5. "Big data is the future of science"

Partially true, but it has some growing up to do. Big data offers new avenues for science. We need only look at the discovery of the Higgs boson, the product of the largest grid computing project in history. In that project, CERN used the Hadoop distributed file system to manage all the data. But unless we recognize and begin to address some of the inherent inadequacies of big data in reflecting human life, we may be making major public **** policy and business decisions based on false stereotypes.

To address this problem, data scientists are beginning to collaborate with social scientists. Over time, this will mean finding new ways to combine big data strategies with small data research. This will go far beyond practices used in advertising or marketing, such as centralized panels or A/B testing (i.e., showing users two versions of a design or result to determine which one works better).

Rather, the new hybrid approach will ask people why they do certain things, rather than just counting how often something happens. This means that sociological analysis and insights about ethnography will be utilized alongside information retrieval and machine learning.

Technology companies realized early on that social scientists could help them gain deeper insights into how and why people relate to their products, such as when Xerox's research center hired the pioneering anthropologist Lucy Sageman. Sageman. The next stage will be to further enrich the collaboration among computer scientists, statisticians, and social scientists of many disciplines -- not only to test their findings, but also to ask very different kinds of questions with greater rigor.

Considering the vast amount of information about us that is collected every day -- including Facebook clicks, global positioning system (GPS) data, medical prescriptions and Netflix booking lists -- we'll sooner or later have to take a closer look at what's going on. -- are collected, and sooner or later we have to decide who to entrust such information to and what purpose to fulfill with it.

We can't avoid the fact that data is by no means neutral; it's hard to keep it anonymous. But we can draw on expertise that spans different domains so that we can better discern biases, flaws, and prejudices.

The above is what I have shared with you about the revolutionary prospect of analyzing big data in depth, and for more information, you can follow the Global Ivy to share more dry goods