饮水思源 - CS精华区文章阅读
发信人: boo (二当家~将e进行到底), 信区: CS
标  题: [转载] CIA Using 'Data Mining' Technology to Find N ┅
发信站: 饮水思源站 (Sat Mar  3 09:28:07 2001), 转信

【 以下文字转载自 E_Commerce 讨论区 】
【 原文由 boo 所发表 】
Friday March 2 3:43 PM ET

             CIA Using 'Data Mining' Technology to Find Nuggets
  
                           By Tabassum Zakaria

   LANGLEY, Va. (Reuters) - The CIA, faced with a daily avalanche of informatio
n, is using new ``data mining'' technology to find useful nuggets within tho
usands of documents and broadcasts in different languages.
The spy agency must sift through a barrage of information from both classifi
ed and unclassified sources in varied formats such as hard text, digital tex
t, imagery, and audio in more than 35 languages.
The Office of Advanced Information Technology (AIT), part of the CIA's Direc
torate of Science and Technology, is focused on finding solutions to the ``v
olume challenge.''

  ``We're not growing at a fast rate, but the amount of information that comes
 into this place is growing by leaps and bounds,'' Larry Fairchild, AIT dire
ctor, said in an interview this week in a basement demonstration room at Cen
tral Intelligence Agency headquarters.

  ``How do we give folks technologies so that they are able to handle the big 
increase in information they're going to have to deal with on a day-to-day b
asis?'' he said.

   One computer tool called ``Oasis'' can convert audio signals from television
 and radio broadcasts into text.

   It can distinguish accented English for greater accuracy in the transcriptio
n, whether the speaker is male or female, and whether one male or female voi
ce is different from another of the same gender.

   At the left of the screen of a transcribed broadcast are labels ``Male 1,'' 
``Female 1,'' ``Male 2,'' next to sentences.

   If one voice is labeled with a name, the computer from then on will put that
 name on anything else with that same voice.

   So for example if a broadcast by Saudi-exile Osama bin Laden, whom the CIA c
onsiders a major threat to Americans, was transcribed and labeled, every tim
e his voice was detected the computer would automatically label it.

Machine Translator

   If the machine translation appears off, the user can with a mouse click hear
 the actual broadcast. For example, the demonstration showed a transcription
 that read ``latest danger from hell'' but the audio said ``latest danger fr
om el nino.''

  The computer cuts down on the time it would take a person to transcribe a ha
lf-hour broadcast to 10 minutes from up to 90 minutes, a CIA employee conduc
ting the demonstration said.

  The CIA is planning to have Oasis developed for different languages such as 
Arabic and Chinese.

  It also finds similar meanings of words being searched, for example a broadc
ast might not mention ``terrorism'' but might say ''car bombing,'' which the
 computer would tag as ``terrorism'' so that anyone searching for that categ
ory would find it.

  Currently the CIA's Foreign Broadcast Information Service is using it in one
 Asian city and intends to have it in other regions such as the Middle East 
this year.

  Another computer tool, ``FLUENT,'' enables a user to conduct computer search
es of documents that are in a language the user does not understand.
The user can put English words into the search field, such as ''nuclear weap
ons,'' and documents in languages such as Russian, Chinese and Arabic pop up
.

  The system will then translate the document and if it is seen as useful, the
 analyst can send it to a human translator for more precision.
Languages that FLUENT can translate into English include Chinese, Korean, Po
rtuguese, Russian, Serbo-Croatian and Ukrainian.

  ``Data mining'' tools are used to extract key pieces of information from a v
ariety of intelligence traffic such as on the flow of illegal drugs and also
 to keep track of illicit financial transactions.

   Tools were developed to help CIA analysts on Iraq, who were asked to analyze
 the agency's holdings on Iraqi war crime violations, about 1.2 million docu
ments going back to 1979.

  The Text Data Mining tool extracted and indexed all words in the data so for
 example if an analyst was asked whether Iraq ever used anthrax as a weapon,
 the analyst could open the tool and find anthrax in the automatically gener
ated index.

  That tool also counts the frequency of word use and can handle various spell
ings of the same Iraqi names or locations.
 
  There is also ``gifting technology'' which gives the flavor of the key infor
mation of a document in a short paragraph, Fairchild said.
With the latest spy furor in the nation's capital, would any of the tools he
lp catch a spy?

  ``Yes, some of the things we're doing can,'' Fairchild said without details.
 ``We're looking at better technologies to put in that area,'' he added.

  Another intelligence official, on condition of anonymity, said: ``If they ha
ve this kind of technology to plumb the depths of open sources, you can imag
ine what kind of technologies they have to track down spies.''

--
我思故我在

※ 来源:·饮水思源站 bbs.sjtu.edu.cn·[FROM: 202.120.7.59]
--
※ 转载:·饮水思源站 bbs.sjtu.edu.cn·[FROM: 202.120.7.59]

[返回上一页] [本讨论区]