AnIntroductiontoDataMiningDiscoveringhiddenvalueinyourdatawarehouseOverviewDatamining,theextractionofhiddenpredictiveinformationfromlargedatabases,isapowerfulnewtechnologywithgreatpotentialtohelpcompaniesfocusonthemostimportantinformationintheirdatawarehouses.Dataminingtoolspredictfuturetrendsandbehaviors,allowingbusinessestomakeproactive,knowledge-drivendecisions.Theautomated,prospectiveanalysesofferedbydataminingmovebeyondtheanalysesofpasteventsprovidedbyretrospectivetoolstypicalofdecisionsupportsystems.Dataminingtoolscananswerbusinessquestionsthattraditionallyweretootimeconsumingtoresolve.Theyscourdatabasesforhiddenpatterns,findingpredictiveinformationthatexpertsmaymissbecauseitliesoutsidetheirexpectations.Mostcompaniesalreadycollectandrefinemassivequantitiesofdata.Dataminingtechniquescanbeimplementedrapidlyonexistingsoftwareandhardwareplatformstoenhancethevalueofexistinginformationresources,andcanbeintegratedwithnewproductsandsystemsastheyarebroughton-line.Whenimplementedonhighperformanceclient/serverorparallelprocessingcomputers,dataminingtoolscananalyzemassivedatabasestodeliveranswerstoquestionssuchas,"Whichclientsaremostlikelytorespondtomynextpromotionalmailing,andwhy?"Thiswhitepaperprovidesanintroductiontothebasictechnologiesofdatamining.Examplesofprofitableapplicationsillustrateitsrelevancetotoday’sbusinessenvironmentaswellasabasicdescriptionofhowdatawarehousearchitecturescanevolvetodeliverthevalueofdataminingtoendusers.TheFoundationsofDataMiningDataminingtechniquesaretheresultofalongprocessofresearchandproductdevelopment.Thisevolutionbeganwhenbusinessdatawasfirststoredoncomputers,continuedwithimprovementsindataaccess,andmorerecently,generatedtechnologiesthatallowuserstonavigatethroughtheirdatainrealtime.Dataminingtakesthisevolutionaryprocessbeyondretrospectivedataaccessandnavigationtoprospectiveandproactiveinformationdelivery.Dataminingisreadyforapplicationinthebusinesscommunitybecauseitissupportedbythreetechnologiesthatarenowsufficientlymature:MassivedatacollectionPowerfulmultiprocessorcomputersDataminingalgorithmsCommercialdatabasesaregrowingatunprecedentedrates.ArecentMETAGroupsurveyofdatawarehouseprojectsfoundthat19%ofrespondentsarebeyondthe50gigabytelevel,while59%expecttobetherebysecondquarterof1996.1Insomeindustries,suchasretail,thesenumberscanbemuchlarger.Theaccompanyingneedforimprovedcomputationalenginescannowbemetinacost-effectivemannerwithparallelmultiprocessorcomputertechnology.Dataminingalgorithmsembodytechniquesthathaveexistedforatleast10years,buthaveonlyrecentlybeenimplementedasmature,reliable,understandabletoolsthatconsistentlyoutperformolderstatisticalmethods.Intheevolutionfrombusinessdatatobusinessinformation,eachnewstephasbuiltuponthepreviousone.Forexample,dynamicdataaccessiscriticalfordrill-throughindatanavigationapplications,andtheabilitytostorelargedatabasesiscriticaltodatamining.Fromtheuser’spointofview,thefourstepslistedinTable1wererevolutionarybecausetheyallowednewbusinessquestionstobeansweredaccuratelyandquickly.EvolutionaryStepBusinessQuestionEnablingTechnologiesProductProvidersCharacteristicsDataCollection(1960s)"Whatwasmytotalrevenueinthelastfiveyears?"Computers,tapes,disksIBM,CDCRetrospective,staticdatadeliveryDataAccess(1980s)"WhatwereunitsalesinNewEnglandlastMarch?"Relationaldatabases(RDBMS),StructuredQueryLanguage(SQL),O...