WASHINGTON—You may never have heard the term "data mining," but it's at the core of the argument that's raging over government eavesdropping on Americans. It's also how commercial companies learn about who you are, where you go, what you eat, what you like, what you buy.
Data mining is the process of using computer technology to extract the knowledge that's buried in enormous volumes of undigested information. Trillions of bits of raw data are culled from telephone calls, e-mails, the Internet, airlines, car rentals, stores, credit card records and a myriad of other sources spawned by the information age.
"A lot can be learned about a person through the combination of massive amounts of data and the use of sophisticated analytical techniques," said Daniel Solove, an associate law professor at George Washington University in Washington.
Whenever you search for information or a product on the Internet, say on Google or Yahoo, you leave a trace.
"Every single search you've ever conducted—ever—is stored on a database somewhere," said Tim Wu, a professor at Columbia Law School in New York. "There's probably nothing more embarrassing than the searches we've made."
Once it's been collected, the data harvest is stored, organized, searched and analyzed by complex computer programs called algorithms.
The programs scour the data for hidden patterns or relationships, such as a suspicious number of insurance claims by an individual or repeated phone calls between, for example, Afghanistan and Detroit.
The Senate Judiciary Committee will open hearings Monday on the Bush administration's use of wiretaps to monitor such calls without a court warrant.
Data mining turns up such potentially meaningful patterns as, say, Person A telephoned B, who e-mailed C, who met with D and E, who rented an apartment together in F-town. Someone at that apartment made a phone call to someone in Country G in the Middle East. Human investigators can take it from there.
Data miners are like gold or diamond miners, who have to burrow through tons of useless material to get the nuggets they want. They couldn't do it without modern computing systems.
"Human analysts with no special tools can no longer make sense of enormous volumes of data," says an advertisement from Megaputer Intelligence Inc., a data-mining firm in Bloomington, Ind. "Data mining automates the process of finding relationships and patterns in raw data."
In the war against terrorism, data mining is a way to "connect the dots," something the government failed to do before the Sept. 11 attacks on the World Trade Center and the Pentagon.
Jeffrey Ullman, a computer scientist who teaches a course on data mining at Stanford University in Palo Alto, Calif., offered a hypothetical example: Suppose you wanted to check a list of 10 suspected evildoers to see if any two of them spent two nights in the same hotel at the same time, perhaps to plot a terrorist attack.
According to Ullman, you'd have to search through at least 250,000 names to spot the suspicious meeting. That's too much for a human analyst but not for a computer.
"Through data mining, (government) agencies can quickly and efficiently obtain information on individuals or groups by exploiting large databases containing personal information," the Government Accountability Office, the investigative arm of Congress, said in a report to Congress last year.
"Before data aggregation and data mining came into use, personal information contained in paper records stored at widely dispersed locations, such as courthouses or other government offices, was relatively difficult to gather and analyze," the GAO said.
A GAO survey found almost 200 data-mining programs in operation or planned at 52 government agencies in 2004.
For example, the State Department draws on a Citibank system to detect fraud or waste by employees using government credit cards.
There's a "greatly increased government hunger for private information of all sorts," said Jonathan Zittrain, an expert on the social implications of the Internet at Harvard Law School in Cambridge, Mass. "As such databases grow, the government essentially possesses its own stockpile of the nation's communications on which to perform searches."
A national security data-mining operation might work like this: A search engine—perhaps similar to Google's—monitors phone calls and communications over the Internet, collecting certain key words, such as "bin Laden," "the sheik" or "nuclear plant." It stores the findings in a computer database and looks for links between the key words and other names, places or telephone numbers.
To make sense of the findings, analysts may use a "data visualization" program to create a three-dimensional map, showing the words as hills on a landscape. Higher peaks mean the words appear more frequently. Closer peaks mean the words are related in some fashion.
Data-mining tools also are used in marketing, finance and politics. Investigators detect insurance fraud. Businesses get leads on good sales prospects. Police confirm which precincts are the most crime-ridden. Political candidates learn where best to spend their time and money.
Quadstone, a data-mining firm in Boston, touts its services: "We've created software that can predict your customer's behavior. Whether you're in the banking, brokerage, insurance, retail, or telecommunications industries, we give you the ability to use past customer history as a tool to understand, predict, and influence their future behavior."
The distinction between government and private data mining is blurring.
"Agencies at all levels of government are now interested in collecting and mining large amounts of data from commercial sources," the GAO reported. "Agencies may use such data ... to perform large-scale data analysis and pattern discovery in order to discern potential terrorist activity by unknown individuals."
The FBI's Foreign Terrorist Tracking Task Force, for example, submits queries to commercial databases for information on suspected suicide bombers, which it can combine with secret government files.
Several government data-mining projects—such as Total Information Awareness and the MATRIX, an acronym for Multistate Anti-Terrorism Information Exchange—were canceled after a public uproar.
Other government data-mining projects include Talon, a program run by the Pentagon's Counterintelligence Field Activity, which collects reports on demonstrators outside U.S. military bases. Thousands of such reports are stored in a database called Cornerstone and are shared with other intelligence agencies.
The Pentagon's Advanced Research and Development Activity, based at Fort Meade, Md., runs a research program whose goal is to develop better ways to mine huge databases to "help the nation avoid strategic surprises ... such as those of September 11, 2001."
Data-mining experts make a distinction between the appropriate use of the technology to detect terrorists or catch criminals and its possible misuse to invade privacy or inhibit free communication.
"The realization that every digital movement is recorded and monitored itself will chill private behavior," Zittrain wrote in the Harvard Law Review.
But Gregory Piatetsky, a Boston-based consultant to data-mining companies, defended the technology in an e-mail interview.
"I believe that data mining technology can be useful," he said, noting its success in detecting credit card fraud and money laundering. In national security cases, he said, the government "may have linked several e-mails from a bad guy to other guys that we know nothing about. Before you can determine whether that guy is good or bad, you first need to intercept" the e-mails.
Some experts say it's all right to use data mining against terrorists, but not against domestic crooks.
"My concern is that the government can't distinguish between fighting the war against militant Islam and ordinary crimes," Stanford's Ullman said in an e-mail. "Just like bank robbery differs in degree from going through a stop sign, terrorism differs in degree from drug crime. ... It's OK to use such a system to pursue terrorists. In fact, I believe it is essential. But we need safeguards to assure it will not be used to track `ordinary' criminals."
For more background information, go to www.twocrows.com and click on "About Data Mining."
(c) 2006, Knight Ridder/Tribune Information Services.
Need to map