11数据仓库技术讲座_57.pptx
《11数据仓库技术讲座_57.pptx》由会员分享,可在线阅读,更多相关《11数据仓库技术讲座_57.pptx(67页珍藏版)》请在冰点文库上搜索。
2023年7月1日星期六,DataWarehousingandOLAPTechnology,1,数据仓库和OLAP技术,什么是数据仓库(Whatisadatawarehouse)?
多维数据模型(Amulti-dimensionaldatamodel)数据仓库体系结构(Datawarehousearchitecture)数据仓库实现(Datawarehouseimplementation)FurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining,2023年7月1日星期六,DataWarehousingandOLAPTechnology,2,数据库的定义,传统的数据库技术是以单一的数据资源为中心,同时进行从事务处理,批处理到决策分析的各类处理;数据库主要是为自动化,精简工作任务和高速数据采集服务的。
它的运行是事务驱动,面向应用的,数据库的根本任务是完成数据操作,即及时安全地将当前事务所产生的记录保存下来。
2023年7月1日星期六,DataWarehousingandOLAPTechnology,3,两种不同的数据处理需求,计算机系统中存在着两类不同的数据处理需求,即:
操作型处理(事务处理):
主要是对一个或一组记录的查询和修改,这时候人们关心的是响应时间、数据的安全性和完整性;分析型处理(信息型处理):
用于管理人员的决策分析,如DDS(decisionsupportsystem)、多维分析等。
2023年7月1日星期六,DataWarehousingandOLAPTechnology,4,为什么要建立数据仓库?
数据DATA,知识KNOWLEDGE,决定DECISIONS,PatternsTrendsFactsRelationsModelsAssociationsSequences,TargetMarketsFundsallocationTradingoptionsWheretoadvertiseCatalogmailinglistSalesgeography,财经的Financial经济的Economic政府Government销售分数Point-of-Sale人口统计学Demographic生活方式Lifestyle,痛苦:
太多数据,无法作出正确判断!
2023年7月1日星期六,DataWarehousingandOLAPTechnology,5,WhatisDataWarehouse?
数据仓库是在企业管理和决策中面向主题的,集成的,与时间相关的和不可修改的数据集合“Adatawarehouseisasubject-oriented,integrated,time-variant,andnonvolatilecollectionofdatainsupportofmanagementsdecision-makingprocess.”W.H.InmonDatawarehousing:
Theprocessofconstructingandusingdatawarehouses,2023年7月1日星期六,DataWarehousingandOLAPTechnology,6,DataWarehouseSubject-Oriented,Organizedaroundmajorsubjects,suchascustomer,product,sales.Focusingonthemodelingandanalysisofdatafordecisionmakers,notondailyoperationsortransactionprocessing.Provideasimpleandconciseviewaroundparticularsubjectissuesbyexcludingdatathatarenotusefulinthedecisionsupportprocess.,2023年7月1日星期六,DataWarehousingandOLAPTechnology,7,面向应用举例采购子系统:
订单(订单号,供应商号,总金额,日期)订单细则(订单号,商品号,类别,单价,数量)供应商(供应商号,供应商名,地址,电话)销售子系统:
顾客(顾客号,姓名,性别,年龄,地址,电话)销售(员工号,顾客号,商品号,数量,单价日期)库存管理子系统:
领料单(领料单号,领料人,商品号,数量,日期)进料单(进料单号,订单号,进料人,收料人,日期)库存(商品号,库房号,库存量,日期)库房(库房号,仓库保管员,地点,库存商品描述)人事管理子系统:
员工(员工号,姓名,性别,年龄,部门号)部门(部门号,部门名称,部门主管,电话),面向主题举例:
商品:
商品固有信息:
商品号,商品名,类别,颜色等商品采购信息:
商品号,供应商号,供应价,供应日期,供应量等商品销售信息:
商品号,顾客号,售价,销售日期,销售量等商品库存信息:
商品号,库房号,日期,库存量等供应商:
供应商固有信息:
供应商号,供应商名,地址,电话等供应商品信息:
供应商号,商品号,供应价,供应日期,供应量等顾客:
顾客固有信息:
顾客号,顾客名,性别,年龄,住址,电话等顾客购物信息:
顾客号,商品号,售价,购买日期,购买量等,2023年7月1日星期六,DataWarehousingandOLAPTechnology,8,DataWarehouseIntegrated,Constructedbyintegratingmultiple,heterogeneousdatasourcesrelationaldatabases,flatfiles,on-linetransactionrecordsDatacleaninganddataintegrationtechniquesareapplied.Ensureconsistencyinnamingconventions,encodingstructures,attributemeasures,etc.amongdifferentdatasourcesE.g.,Hotelprice:
currency,tax,breakfastcovered,etc.Whendataismovedtothewarehouse,itisconverted.,2023年7月1日星期六,DataWarehousingandOLAPTechnology,9,DataWarehouseTimeVariant,Thetimehorizonforthedatawarehouseissignificantlylongerthanthatofoperationalsystems.Operationaldatabase:
currentvaluedata.Datawarehousedata:
provideinformationfromahistoricalperspective(e.g.,past5-10years)EverykeystructureinthedatawarehouseContainsanelementoftime,explicitlyorimplicitlyButthekeyofoperationaldatamayormaynotcontain“timeelement”.,2023年7月1日星期六,DataWarehousingandOLAPTechnology,10,DataWarehouseNon-Volatile,Aphysicallyseparatestoreofdatatransformedfromtheoperationalenvironment.Operationalupdateofdatadoesnotoccurinthedatawarehouseenvironment.Doesnotrequiretransactionprocessing,recovery,andconcurrencycontrolmechanismsRequiresonlytwooperationsindataaccessing:
initialloadingofdataandaccessofdata.,2023年7月1日星期六,DataWarehousingandOLAPTechnology,11,DataWarehousevs.HeterogeneousDBMS,TraditionalheterogeneousDBintegration:
Buildwrappers/mediatorsontopofheterogeneousdatabasesQuerydrivenapproachWhenaqueryisposedtoaclientsite,ameta-dictionaryisusedtotranslatethequeryintoqueriesappropriateforindividualheterogeneoussitesinvolved,andtheresultsareintegratedintoaglobalanswersetComplexinformationfiltering,competeforresourcesDatawarehouse:
update-driven,highperformanceInformationfromheterogeneoussourcesisintegratedinadvanceandstoredinwarehousesfordirectqueryandanalysis,2023年7月1日星期六,DataWarehousingandOLAPTechnology,12,DataWarehousevs.OperationalDBMS,OLTP(on-linetransactionprocessing)MajortaskoftraditionalrelationalDBMSDay-to-dayoperations:
purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP(on-lineanalyticalprocessing)MajortaskofdatawarehousesystemDataanalysisanddecisionmakingDistinctfeatures(OLTPvs.OLAP):
Userandsystemorientation:
customervs.marketDatacontents:
current,detailedvs.historical,consolidatedDatabasedesign:
ER+applicationvs.star+subjectView:
current,localvs.evolutionary,integratedAccesspatterns:
updatevs.read-onlybutcomplexqueries,2023年7月1日星期六,DataWarehousingandOLAPTechnology,13,OLTPvs.OLAP,2023年7月1日星期六,DataWarehousingandOLAPTechnology,14,WhySeparateDataWarehouse?
HighperformanceforbothsystemsDBMStunedforOLTP:
accessmethods,indexing,concurrencycontrol,recoveryWarehousetunedforOLAP:
complexOLAPqueries,multidimensionalview,consolidation.Differentfunctionsanddifferentdata:
missingdata:
DecisionsupportrequireshistoricaldatawhichoperationalDBsdonottypicallymaintaindataconsolidation:
DSrequiresconsolidation(aggregation,summarization)ofdatafromheterogeneoussourcesdataquality:
differentsourcestypicallyuseinconsistentdatarepresentations,codesandformatswhichhavetobereconciled,2023年7月1日星期六,DataWarehousingandOLAPTechnology,15,DataWarehousingandOLAPTechnology,Whatisadatawarehouse?
Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining,2023年7月1日星期六,DataWarehousingandOLAPTechnology,16,FromTablesandSpreadsheetstoDataCubes,AdatawarehouseisbasedonamultidimensionaldatamodelwhichviewsdataintheformofadatacubeAdatacube,suchassales,allowsdatatobemodeledandviewedinmultipledimensionsDimensiontables,suchasitem(item_name,brand,type),ortime(day,week,month,quarter,year)Facttablecontainsmeasures(suchasdollars_sold)andkeystoeachoftherelateddimensiontablesIndatawarehousingliterature,ann-Dbasecubeiscalledabasecuboid.Thetopmost0-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapexcuboid.Thelatticeofcuboidsformsadatacube.,2023年7月1日星期六,DataWarehousingandOLAPTechnology,17,Cube:
ALatticeofCuboids,all,time,item,location,supplier,time,item,time,location,time,supplier,item,location,item,supplier,location,supplier,time,item,location,time,item,supplier,time,location,supplier,item,location,supplier,time,item,location,supplier,0-D(apex)cuboid,1-Dcuboids,2-Dcuboids,3-Dcuboids,4-D(base)cuboid,2023年7月1日星期六,DataWarehousingandOLAPTechnology,18,ConceptualModelingofDataWarehouses,Modelingdatawarehouses:
dimensions&measuresStarschema:
AfacttableinthemiddleconnectedtoasetofdimensiontablesSnowflakeschema:
Arefinementofstarschemawheresomedimensionalhierarchyisnormalizedintoasetofsmallerdimensiontables,formingashapesimilartosnowflakeFactconstellations:
Multiplefacttablessharedimensiontables,viewedasacollectionofstars,thereforecalledgalaxyschemaorfactconstellation,2023年7月1日星期六,DataWarehousingandOLAPTechnology,19,ExampleofStarSchema,SalesFactTable,time_key,item_key,branch_key,location_key,units_sold,dollars_sold,avg_sales,Measures,2023年7月1日星期六,DataWarehousingandOLAPTechnology,20,ExampleofSnowflakeSchema,SalesFactTable,time_key,item_key,branch_key,location_key,units_sold,dollars_sold,avg_sales,Measures,2023年7月1日星期六,DataWarehousingandOLAPTechnology,21,ExampleofFactConstellation,SalesFactTable,time_key,item_key,branch_key,location_key,units_sold,dollars_sold,avg_sales,Measures,ShippingFactTable,time_key,item_key,shipper_key,from_location,to_location,dollars_cost,units_shipped,2023年7月1日星期六,DataWarehousingandOLAPTechnology,22,ADataMiningQueryLanguage,DMQL:
LanguagePrimitives,CubeDefinition(FactTable)definecube:
DimensionDefinition(DimensionTable)definedimensionas()SpecialCase(SharedDimensionTables)Firsttimeas“cubedefinition”definedimensionasincube,2023年7月1日星期六,DataWarehousingandOLAPTechnology,23,DefiningaStarSchemainDMQL,definecubesales_startime,item,branch,location:
dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city,province_or_state,country),2023年7月1日星期六,DataWarehousingandOLAPTechnology,24,DefiningaSnowflakeSchemainDMQL,definecubesales_snowflaketime,item,branch,location:
dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier(supplier_key,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city(city_key,province_or_state,country),2023年7月1日星期六,DataWarehousingandOLAPTechnology,25,DefiningaFactConstellationinDMQL,definecubesalestime,item,branch,location:
dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city