

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、<p> DATA WAREHOUSE</p><p> Data warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large nu
2、mber of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise
3、-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is th</p><p> “So", you may ask, full of intrigue, “what exactly is a data warehouse?"<
4、;/p><p> Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an or
5、ganization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical
6、data for analysis.</p><p> According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile col
7、lection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, tim
8、e-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational</p><p> (1)Subject-oriented: A data warehouse is organized around major subjects, such as custo
9、mer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. He
10、nce, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.</p><p> (2)Integrated: A data warehou
11、se is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure cons
12、istency in naming conventions, encoding structures, attribute measures, and so on..</p><p> (3)Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5-10 years).
13、 Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.</p><p> (4)Nonvolatile: A data warehouse is always a physically separate store of data transformed f
14、rom the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two
15、operations in data accessing: initial loading of data and access of data..</p><p> In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision sup
16、port data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous s
17、ources to support structured and/or ad hoc queries, analytical reporting, and decision making.</p><p> “OK", you now ask, “what, then, is data warehousing?"</p><p> Based on the abov
18、e, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data wareh
19、ouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview o
20、f the data, and to make sound decisions based on</p><p> “How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision
21、making activities, including:</p><p> (1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spendin
22、g). </p><p> (2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies.<
23、/p><p> (3) analyzing operations and looking for sources of profit. </p><p> (4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.&
24、lt;/p><p> Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multip
25、le, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database ind
26、ustry and research community towards achieving this goal.</p><p> The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multip
27、le, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate
28、 for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned fro</p><p> Data warehousing provides an interesting alternative to t
29、he traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneo
30、us sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a dat
31、a w</p><p> 1.Differences between operational database systems and data warehouses</p><p> Since most people are familiar with commercial relational database systems, it is easy to understand
32、what a data warehouse is by comparing these two kinds of systems.</p><p> The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are cal
33、led on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data ware
34、house systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and</p><p> The major distinguishing features between OL
35、TP and OLAP are summarized as follows.</p><p> (1)Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information techn
36、ology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.</p><p> (2)Data contents: An OLTP system manages curr
37、ent data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages inf
38、ormation at different levels of granularity. These features make the data easier for use in informed decision making.</p><p> (3)Database design: An OLTP system usually adopts an entity-relationship (ER) da
39、ta model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.</p><p> (4)View: An OLTP system focuses mainl
40、y on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to th
41、e evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are
42、stored on multiple stor</p><p> (5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms.
43、However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries. </p><p> Other fea
44、tures which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.</p><p> 2.But, why have a separate data warehouse?</p><p&g
45、t; “Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a s
46、eparate data warehouse?"</p><p> A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloa
47、ds, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of la
48、rge groups of data at summarized levels, and may require the use of special data organization, access, and implementa</p><p> Moreover, an operational database supports the concurrent processing of several
49、transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records fo
50、r summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce</p><p>
51、; Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas opera
52、tional databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (
53、such as aggregation and summarization) of data from heterogeneo</p><p><b> 數據倉庫</b></p><p> 數據倉庫為商務運作提供了組織結構和工具,以便系統(tǒng)地組織、理解和使用數據進行決策。許多組織發(fā)現在如今的具有競爭與快速發(fā)展的世界中數據倉庫是非常有用的工具。</p>
54、<p> 在最近的幾年里,許多公司花了幾百萬美元用于構建企業(yè)數據庫。許多人也認為隨著競爭加劇,數據倉庫己成為營銷必備的手段——一種了解顧客的需求的武器。</p><p> “那么”,你可能會充滿神秘地問,“到底什么是數據倉庫?”</p><p> 數據倉庫有不同的定義,但卻很難有一個嚴格的定義。不嚴謹的說,數據倉庫是一個數據庫,它與組織機構的操作數據庫分別維護。數據倉庫允許
55、不同應用系統(tǒng)的集成,為統(tǒng)一的歷史數據分析提供堅實的平臺,對信息處理提供支持。</p><p> 按照W.H Inmon,一位數據倉庫構造方面的領頭建筑師說,“數據倉庫是一個面向主題的、集成的、隨時間變化的、非易失的數據的集合,支持管理決策制定?!边@個簡短,但是復合的定義表述了數據倉庫的主要特點。四個關鍵詞,面向主題的、集成的、時變的、非易失的,將數據倉庫與其它數據存儲系統(tǒng)相區(qū)別。讓我們進下來認識它的四個特征。&
56、lt;/p><p> (1)面向對象:數據倉庫是圍繞一些主題,如顧客、供應商、產品和銷售組織。數據倉庫關注決策者的數據建模與分析,而不是構造機構日常操作和事務處理。因此,數據倉庫排除了在進程中提供的沒有價值的決策。</p><p> (2)集成的:數據倉庫通常由多個數據源組成,如關系數據庫、一般文件和聯(lián)機事務處理記錄。數據清理和數據集成技術被運用于確保命名的合理性、代碼的結構,結構尺度等。
57、</p><p> (3)隨時間變化:數據被存儲是用來提供變化歷史角度的信息。數據倉庫中所包含的關鍵字,都顯性或隱性的反映時間元素。</p><p> (4)非易失性:數據倉庫是物理地分離存放數據;基于這種分法,數據倉庫不需要傳輸進程,覆蓋和并發(fā)控制機制。它通常只需要兩種數據訪問:數據的初使化裝入和數據訪問。</p><p> 總得來說,數據倉庫是一種語義上一
58、致的數據存儲,它充當了物理決策數據模型的實施關于哪種企業(yè)需要做戰(zhàn)略決策。數據倉庫經常被認作一種結構,由集成的數據組合而成,支持結構化和啟發(fā)式查詢、分析報告和決策制定。</p><p> “好”,“現在你可以問什么是數據倉庫?!?lt;/p><p> 基于以上所講的,我們把數據倉庫視為構造和使用數據倉庫的過程。數據倉庫的構造需要數據集成、數據清理和數據統(tǒng)一。利用數據倉庫常常需要一些決策支持技
59、術。這使得知識工作者能夠利用數據倉庫,快捷方便地得到數據總體視圖,根據數據倉庫中的信息做出準確的決策。有些人使用術語“建立數據庫”表示構造數據倉庫的過程,用倉庫DBMS表示管理和使用數據倉庫。我們將不區(qū)分二者。</p><p> “組織是如何從數據倉庫中使用數據的?”許多組織使用這些信息支持決策活動,包括:</p><p> (1)增加顧客關注,包括分析顧客購買模式(如,喜愛買什么、購
60、買時間、預算周期、消費習慣);</p><p> (2)根據季度、年、地區(qū)的營銷情況比較,重新配置產品和管理投資,調整生產策略;</p><p> (3)分析運作和查找利潤源;</p><p> (4)管理顧客關系、進行環(huán)境調整、管理合股人的資產開銷。</p><p> 從異種數據庫集成的角度看,數據倉庫也是十分有用的。許多組織收集了
61、不同類的數據,并由多個異種的、自治的、分布的數據源維護大型數據庫。集成這些數據,并提供簡便、有效的訪問是非常希望的,并且也是一種挑戰(zhàn)。數據庫工業(yè)界和研究界都正朝著實現這一目標竭盡全力。</p><p> 對于異種數據庫的集成,傳統(tǒng)的數據庫做法是:在多個異種數據庫上,建立一個包裝程序和一個集成程序(或仲裁程序)。這方面的例子包括IBM 的數據連接程序 和Informix的數據刀。當一個查詢提交客戶站點,首先使用元
62、數據字典對查詢進行轉換,將它轉換成相應異種站點上的查詢。然后,將這些查詢映射和發(fā)送到局部查詢處理器。由不同站點返回的結果被集成為全局回答。這種查詢驅動的方法需要復雜的信息過濾和集成處理,并且與局部數據源上的處理競爭資源。這種方法是低效的,并且對于頻繁的查詢,特別是需要聚集操作的查詢,開銷很大。</p><p> 對于異種數據庫集成的傳統(tǒng)方法,數據倉庫提供了一個有趣的替代方案。數據倉庫使用更新驅動的方法,而不是查
63、詢驅動的方法。這種方法將來自多個異種源的信息預先集成,并存儲在數據倉庫中,供直接查詢和分析。與聯(lián)機事務處理數據庫不同,數據倉庫不包含最近的信息。然而,數據倉庫為集成的異種數據庫系統(tǒng)帶來了高性能,因為數據被拷貝、預處理、集成、注釋、匯總,并重新組織到一個語義一致的數據存儲中。在數據倉庫中進行的查詢處理并不影響在局部源上進行的處理。此外,數據倉庫存儲并集成歷史信息,支持復雜的查詢。這樣,建立數據倉庫在工業(yè)界就非常流行。</p>
64、<p> 1.操作數據庫系統(tǒng)與數據倉庫的區(qū)別</p><p> 由于大多數人都熟悉商品關系數據庫系統(tǒng),將數據倉庫與之比較,就容易理解什么是數據倉庫。</p><p> 聯(lián)機操作數據庫系統(tǒng)的主要任務是執(zhí)行聯(lián)機事務和查詢處理。這種系統(tǒng)稱為聯(lián)機事務處理(OLTP)系統(tǒng)。它們涵蓋了一個組織的大部分日常操作,如購買、庫存、制造、銀行、工資、注冊、記帳等。另一方面,數據倉庫系統(tǒng)在數據
65、分析和決策方面為用戶或“知識工人”提供服務。這種系統(tǒng)可以用不同的格式組織和提供數據,以便滿足不同用戶的形形色色需求。這種系統(tǒng)稱為聯(lián)機分析處理(OLAP)系統(tǒng)。</p><p> OLTP 和OLAP 的主要區(qū)別概述如下。</p><p> ?。?)用戶和系統(tǒng)定位:聯(lián)機事務處理是以顧客為導向,用于給客戶和信息技術專家</p><p> 傳輸和職員查詢處理。在線分析
66、系統(tǒng)是以市場為導向,用于知識工作者包括管理員、執(zhí)行官和分析員處理數據。</p><p> ?。?)數據內容:聯(lián)機事務處理系統(tǒng)管理當前數據,特別的,都是一些詳細并且簡單可以用于做決定。在線分析系統(tǒng)管理大量歷史數據,提供總結和聚集的設備,存儲和管理不同水平的粒度。這些特征使得用戶在做決策上更簡單。</p><p> ?。?)數據庫的設計:聯(lián)機處理系統(tǒng)通常采用實體數據模型和應用聯(lián)機系統(tǒng)數據設計。
67、在線分析系統(tǒng)采用星形或雪花模型和面向主題的數據庫設計。</p><p> ?。?)視圖:聯(lián)機事務處理系統(tǒng)聚焦于當前企業(yè)或部門數據,而不涉及到歷史數據或在不同組織中的數據??偟脕碚f,在線分析系統(tǒng)經??缭皆S多數據庫版本,基于組織機構的改革。在線分析系統(tǒng)同樣處理來自不同組織的數據,從大量數據存儲中整合信息。由于體積的龐大,在線分析系統(tǒng)在多個數據媒體上建立存儲。</p><p> (5)存儲模式
68、:聯(lián)機處理系統(tǒng)組成短小,自動交易。如此的一個系統(tǒng)需要并發(fā)控制和恢復機制。然而,在線分析系統(tǒng)存儲大部分是只讀的,盡管大部分可以復雜查詢。</p><p> 其它區(qū)分聯(lián)機處理系統(tǒng)和在線分析系統(tǒng)包括數據大小,操作的頻率,性能的指標。</p><p> 2.但是,為什么需要一個分離的數據庫?</p><p> “既然操作數據庫存儲了大量的數據”,你也看到了,“為什么不
69、直接執(zhí)行在線分析系統(tǒng)數據庫替代花費大量時間和資源去構建一個分離的數據庫?</p><p> 這種分離的一個主要的原因是可以提高兩個系統(tǒng)的性能。操作數據庫是在己知的任務和負載設計的,如果用主關鍵字索引和散列,檢索特定的記錄和優(yōu)化“罐裝”的查詢。另一方面,數據倉庫查詢通常是復雜的。它們涉及了一堆數據總括水平的大量運算,它們中的一些需要特殊的算法,存儲和基于多維視圖的實現方法。在線分析系統(tǒng)進程查詢在操作數據中可能需要
70、降解大量的操作工作。</p><p> 另外,操作數據庫支持幾個交易的并行處理。并行控制和恢復機制,比如鎖定和測量,都需要確保交易的一致性和穩(wěn)定性。在線分析系統(tǒng)查詢通常需要對數據記錄進行只讀訪問,以進行匯總和聚集。并行控制和恢復機制,如果應用于聯(lián)機處理系統(tǒng),可能會危害控制交易的執(zhí)行,那樣的話,會大大地了降低在線分析系統(tǒng)的吞吐量。</p><p> 最后,從數據倉庫中分離數據的操作是基于
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 計算機畢業(yè)論文外文翻譯
- 計算機畢業(yè)論文外文翻譯10
- 計算機專業(yè)畢業(yè)論文外文翻譯7
- 計算機畢業(yè)論文外文翻譯--asp概述
- 計算機專業(yè)畢業(yè)論文外文翻譯15
- 計算機專業(yè)畢業(yè)論文外文翻譯--計算機病毒介紹
- 計算機專業(yè)畢業(yè)論文外文翻譯2篇
- 計算機專業(yè)畢業(yè)論文翻譯
- 鍋爐的計算機控制畢業(yè)論文外文翻譯
- 無線局域網-計算機畢業(yè)論文外文翻譯
- 計算機畢業(yè)論文范文畢業(yè)論文計算機專業(yè)
- 計算機畢業(yè)論文外文翻譯---面向對象和c++
- 計算機畢業(yè)論文
- 計算機畢業(yè)論文
- 計算機專業(yè)畢業(yè)論文外文翻譯--數據傳送指令
- 計算機專業(yè)畢業(yè)論文外文翻譯--數據類型和值域
- 計算機專業(yè)畢業(yè)論文---報表設計器開發(fā)(含外文翻譯)
- 計算機畢業(yè)論文文獻翻譯資料
- 計算機專業(yè)畢業(yè)論文外文翻譯--輸入輸出訪問
- 計算機專業(yè)畢業(yè)外文翻譯
評論
0/150
提交評論