Premium Essay

Statistical Databases

In:

Submitted By banutz
Words 11702
Pages 47
Statistical Databases
Jaideep Srivastava and Hung Q. Ngo, Department of Computer Science, University of Minnesota, 200 Union street, EE/CS Building, room 4-192, Minneapolis, MN 55455 e-mail: srivasta, hngo @cs.umn.edu,
¡

1 Introduction
A statistical database management system (SDBMS) is a database management system that can model, store and manipulate data in a manner well suited to the needs of users who want to perform statistical analyses on the data. Statistical databases have some special characteristics and requirements that are not supported by existing commercial database management systems. For example, while basic aggregation operations like SUM and AVG are part of SQL, there is no support for other commonly used operations like variance and co-variance. Such computations, as well as more advanced ones like regression and principal component analysis, are usually performed using statistical packages and libraries, such as SAS [1] and SPSS [2]. From the end user’s perspective, whether the statistical calculations are being performed in the database or in a statistical package can be quite transparent, especially from a functionality viewpoint. However, once the datasets to be analyzed grow beyond a certain size, the statistical package approach becomes infeasible, either due to its inability to handle large volumes of data, or the unacceptable computation times which make interactive analysis impossible. With the increasing sophistication of data collection instrumentation, and the cheap availability of large volume and high speed storage devices, most applications are today collecting data at unprecedented rates. In addition, an increasing number of applications today want the ability to perform interactive and on-line analysis of this data in real time, such as “what-if” analysis in forecasting. The emergence of multiple gigabyte corporate data

Similar Documents

Free Essay

Developing a Statistical Database

...Phase 2 Individual Project By Troy Verdeyen Part I I developing a statistical database that following 2 sets of data that list football teams and quarterbacks and there are 7 teams and 5 quarterbacks: | D | Q | Jets | Giants | Cowboys | 49’ers | Patriots | Rams | Chiefs | Joe Namath | Played for | | | | | Played for | | Eli Manning | | Played for | | | | | | Troy Aikman | | | Played for | | | | | Joe Montana | | | | Played for | | | Played for | Tom Brady | | | | | Played for | | | D = {Jets, Giants, Cowboys, 49’ers, Patriots, Rams, Chiefs} Q = {Tom Brady, Joe Namath, Troy Aikman, Joe Montana, Eli Manning} I looked up the players and teams and found out this information. Joe Namath played for the Jets and the Rams and Joe Montana played for the 49ers and the Chiefs. The others like Troy Aikman played only the Cowboys and Eli Manning only the Giants and Tom Brady only the Patriots. I found all this on the http://www.pro-football-reference.com web site and I copied there stats at the end of this paper The domain of D set are: (Jets, Namath),(Giants, Manning),(Cowboy, Aikman (49ers, Montana), (Patriots, Brady). In this order this is a function, because each member of the domain is not in any other domain. The order pair of Q and D that are not functions because they played on two of the teams are (Namath, Jets) and (Montana, Chiefs) and because there would be multiple range elements possible if it was a function. Part II ...

Words: 812 - Pages: 4

Premium Essay

Big Data (Mongodb, Hbase and Casandra)

...Big Data [Name of Writer] [Name of Institution] Introduction The term Big Data is gaining more followers and popularity. However, despite this trend, not all organizations are clear about how to face the challenge to store, organize, display and analyze large volumes of data. The term Big Data is gaining more followers and popularity. However, despite this trend so evident, not all organizations are clear about how to face the challenge to store, organize, display and analyze large volumes of data. There are multiple techniques in terms of huge database storing approaches that can store petabytes, exabytes and may be zetabytes data. These options are Cassendara, Mongodb and HBase. We will discuss about them one by one and in a proper research method and will compare them in order to contrast their difference and efficiency. Research Background One problem in understanding the phenomenon is that the size of these data sets the volume greatly exceeds the Data warehouse. A plane collects 10 terabytes of information from sensors every 30 minutes flight, while the Stock Exchange of New York collects structured information 1 TB per day. In the context of Big Data, volumes are reaching peta bytes, exa bytes and then soon to zeta bytes. For instance, Apple has just announced that 7 trillion send daily notifications to iOS devices. The explosion of information in social networks, blogs, and emails is characterized the presence of data key...

Words: 3463 - Pages: 14

Free Essay

Data Information

...live in a world where data is a critical resource. Information is also a critical resource and consists of data that is processed into meaningful information for the purpose of organizations and users. Collected data is stored into what is known as databases where it is organized into potentially valuable information. Data also known as Raw data is a stream of facts that are not organized or arranged into a form that people can understand or use (Gillenson, Ponniah, Kriegel, Trukhnov, Taylor, Powell, & Miller, 2008) . Raw Data are facts that have not yet been processed to reveal their meaning (Gillenson, Ponniah, Kriegel, Trukhnov, Taylor, Powell, & Miller, 2008). For example when AT&T wireless ask their clients to participate in a survey about the products they have purchased or how was their customer service experience the data collected is useful but not until the raw data is organized by combining it with other similar data and analyzed into meaningful information. Information is the result of processing raw data to reveal its meaning (Coronel, Morris, & Rob, 2010). Data processing can be as simple as organizing data to reveal patterns or as complex as making forecasts or drawing inferences using statistical modeling (Gillenson, Ponniah, Kriegel, Trukhnov, Taylor, Powell, & Miller, 2008). Both data and information are types of knowledge which share similarities in the fact that each are used to draw a conclusion for a specific purpose. Data is gathered...

Words: 538 - Pages: 3

Premium Essay

"Enterprise Level Data Work Flows and Data Warehouse

...whom this project would have been a distant reality. We also extend our heartfelt thanks to our family and well wishers. I would like to take this occasion to specially thank University of Northern Virginia to provide us with excellent faculty and also in supporting us getting quality education remotely. Contents SL No Title Page no 1 Abstract 5 2 Introduction to Databases 6 3 OLTP and OLAP Systems 7 4 Difference between OLTP and OLAP 9 5 Data Modeling 13 6 Workflows in Enterprise level Data warehousing 18 7 Business Intelligence tools used in Data flow and Data Warehousing 21 8 Analysis in Data warehousing 24 9 Conclusion 28 10 Foot Note 30 11 References 31 ABSTRACT These days majority of the applications, may it be web applications or windows applications or mobile applications, are completely database dependent. Most of the application developments are becoming database driven environments, hence rendering databases as one of the most key elements in a software environment. This dependency on databases can attributed to the increasing number of data requirements from the...

Words: 6349 - Pages: 26

Premium Essay

Hbs Case

...operations. SYSCO has two different divisions, the broad-line companies and the specialty companies, which have separated profit and loss statements. With the use of BI, people can easily access the statements between these different divisions. There is a lack of consistency between part numbers, customer identifications, order statuses, and other important information which makes it difficult for managers to monitor and compare performance. 2. How can SYSCO take advantage of BI? Business Intelligence can be advantageous to SYSCO by providing statistical analysis, graphical representations and access to important data. With the use of dashboards, users at every level can easily access summaries of important information. BI software can combine data from separate warehouses and databases to help managers make better business decisions. With all the information centralized, managers save time when accessing other divisions’ databases. BI uses data mining to automatically sort through large pools of data to determine trends and patterns that could have otherwise been overlooked by managers. 3. What are the potential obstacles? The potential obstacle of Business Intelligence implementation is determining how much software to buy and when to buy it. SYSCO needs to determine the correct balance of software with the current needs of the company. Employee training is required for the new software which requires time and money in addition to the cost of the software. Another...

Words: 349 - Pages: 2

Premium Essay

Efficiency and Collaboration

...new and innovated ways to maximize profits as well as employee productivity. The goal is to take a functional system and convert it into a structure that makes all records manageable and easily retrievable. In the process of innovation Party Plates will explore collaborative software that will facilitate communication and assist in sales. Efficiency and Collaboration Proposal As the manager of sales department at Party Plates; we have been considering changing form Microsoft Excel to Microsoft Access. These two products can be compatible to each other; however each as its own purpose and one can help our sales department better than the other (Sanger, 2012). Functions that are different between them are that Access is a relational database management system and can manage large amounts. Accesses can creat4 relationships between data because it uses flat files and can handle thousands of records. Excel is a spreadsheet and dose math. Excel has limited functionality and can handle up to 15,000 records. (Sanger, 2012). On the effort to increase sales, the corporation decided to use Microsoft Excel systems; it helps not also to increase sales but to have a better communication between the employees. The benefits are highly quantify, starting with the internal work environment, with the use of it the communication between employees is clear and faster, the corporation now can use applications that help to create documents, or prevent the duplication of data. With...

Words: 859 - Pages: 4

Premium Essay

Sofware Applications and Information Systems

...needs. hardware, communications devices, network, people, and specific tasks. The form of operation is very similar to any other systems, which requires the inputs from users by keying in commands and instructions, scanning and typing. These date information inputs are processed by calculating, reporting, and using technological devices such as computers. In addition to information systems, application software also helps business to process and operate daily tasks more accurately allowing for more advance and secure methods than the more traditional methods used in the past. Software applications provide the user with diverse programs used to operate computers and related devices. For example, spreadsheets, word processors and database management systems fall under software applications. Software applications and information systems are relate to the organizational departments. Examples of these departments are accounting, human resources and marketing. According to Answers.com (2011) Accounting Information Systems (AIS) “is the collection, storage and processing of financial and accounting data that is used by decision makers. AIS is generally a computer-based method for tracking accounting activities in conjunction with information technology resources. The resulting...

Words: 638 - Pages: 3

Premium Essay

Database Program vs. Spreadsheet Program

...Database program Vs. Spreadsheet program When choosing which type of program you should use to manage your customer records, you need to know what the difference is between the two. Spreadsheets are designed to analyze data and sort list items, not for long-term storage of raw data. A spreadsheet should be used for ‘crunching’ numbers and storage of single list items. Spreadsheet programs provide the means for keeping inventory, statistical data modeling, and computing data. They also include graphing functions that allow for quick reporting and analysis of data. Spreadsheet programs are relatively easy to use, require little training to get started, and have the advantage that most data mangers are somewhat familiar with them. The disadvantages of spreadsheets include having to re-copy data over and over again to maintain it in separate data files, the inability to efficiently identify data errors, the lack of detailed sorting and querying abilities, and sharing violations among users wishing to view or change data at the same time. Additionally, spreadsheets are restricted to a finite number of records, and can require a large amount of hard-drive space for data storage. Databases require little or no duplication of data between information tables, and changes made to the data do not corrupt the programming. Databases offer better security to restrict users from accessing privileged information, and from changing coded information in the programming. Furthermore, the two...

Words: 415 - Pages: 2

Free Essay

Syllabus

... | | | | | |Theory |Sessional* | | |MCA-101 |Computer Fundamentals and Problem Solving Using C |3 Hours |80 |20 |100 | |MCA-102 |Computer Organisation |3 Hours |80 |20 |100 | |MCA-103 |Discrete Mathematical Structures |3 Hours |80 |20 |100 | |MCA-104 |Software Engineering |3 Hours |80 |20 |100 | |MCA-105 |Computer Oriented Numerical and Statistical Methods |3 Hours |80 |20 |100 | |MCA-106 |Software Laboratory - I |3 Hours | | |100 | | |C (Based on MCA-101) | | | | | |MCA-107 |Software Laboratory – II |3 Hours | | |100 | | |C (Based on MCA-105) | | | | | |MCA-108 |Seminar...

Words: 13848 - Pages: 56

Free Essay

Marketing Project Sources

...www.zpub.com/sf/arl/ www.prars.com/ Web Sites http://www.census.gov Provides access to general population figures. Overview of Economic Statistical Programs: http://www.census.gov/econ/overview/ Provides general statistics on American businesses (including retail). www.about.com Use the search feature to find articles on your product and company. Databases (Use these to find articles): Lakeland Library through OhioLINK offers you access to articles that have been published in journals, magazines, and newspapers. Many of these articles are not available on the general Web. To access these databases, you must be registered in the Library. Once you have registered, you can use the Web to go to the Lakeland Library home page (http://library.lakelandcc.edu) and click on Home Access – To Databases. Follow the prompts and then choose Listed by Name. Most of the articles are available in full-text (but not all of them). Business & Industry – Broad trends in industry; provides news items about acquisitions and mergers; product advertising and more. Business Source Premier – Good source for business/industry overviews. Many articles are available in full-text. Provides access to Datamonitor Company Profiles (which also give industry overviews). Lexis Nexis Academic – Choose the Business section of this database to get to the full text of a large number of very current industry magazines such as Global Wireless, Bank Technology News, and Biotech Business...

Words: 440 - Pages: 2

Premium Essay

Data Design Methods

...Introduction Many organizations and companies rely on databases to run their operations and achieve competitive advantage. Database design refers to the different parts of the design of an overall database system. It can be thought of as the logical data structures used to store data, and the forms and queries used as part of the overall database application within the database management system (Wikipedia.org). The paper focuses on database design methods and steps that can be taken to achieve a good design structure that avoids redundancy, duplicate data or the absence of required data. The need to understand database models Databases are important to the organizational setting. Databases allow organizations to share data across multiple applications and systems. Organizations build several databases each one sharing data with several information systems. This is because it is almost impractical to build one database to meet an entire organization’s needs. Therefore data design is critical to the consistency, integrity and accuracy of the data in a database. A database that is improperly designed will make it difficult to retrieve certain types of information. Besides, there is the risk that searches will produce inaccurate results or information that may have potential damaging effects on a company's bottom line. Inaccurate database may also affect the daily operations of a business and its future direction. A good database addresses the informational and operational needs of...

Words: 1407 - Pages: 6

Premium Essay

Distributed System

...platforms. The database is the critical part of that platform. Therefore it is imperative that our cloud database be compatible with cloud computing. Key Design principles of the cloud model: The core design principle is dynamic scalability, or the ability to provision and decommission servers on demand. The shared-disk database architecture is ideally suited to cloud computing. It requires fewer and lower cost servers, it provides high availability, reduces maintenance costs by eliminating partitioning and it delivers dynamic scalability. Benefits of Cloud Computing: a. Lower Costs: All resources are shared resulting in reduced costs. b. Shifting CapEx to OpEx: This enables customer to focus on adding value in their areas of competence. It allows customer to focus their money and resources on innovating. c. Agility d. Dynamic Scalability: It can smoothly and efficiently scale to the spikes with a more cost-effective pay-as-you-go model. e. Simplified maintenance: All Patches and upgrades are deployed across the shared infrastructure. f. Large scale prototyping/load testing g. Diverse platform support h. Faster Management approval i. Faster development With corporate adoption of cloud computing there are explosion of cloud options. One of those options is the provisioning of database services in the form of cloud databases or Database-as-a-Service (DaaS). The Cloud databases serviced consumer...

Words: 3040 - Pages: 13

Premium Essay

Nt1330 Unit 8

...LESSON 8: DATABASE SECURITY 8.0 LEARNING OBJECTIVES AND OUTCOMES Following are the security-related tasks which you as database administrator should be familiar with: • Ensuring secure database installation and configuration. • Managing the security aspects of user accounts: creating and assigning roles, developing secure password policies, restricting data access to only the appropriate users, and so on • Ensuring secure network connections • Encrypting and decrypting sensitive data • Ensuring the database has no security vulnerabilities and protection against intruders • Deciding what database components to audit and how granular you want this auditing to be • Downloading and installing security patches you might be able to perform these...

Words: 1968 - Pages: 8

Premium Essay

Database Theories

...Modern Database Applications | [Type the document title] | | | Contents 1. Gray, J. (2009). Jim Gray on eScience: A transformed scientific method. The Fourth Paradigm: Data-intensive scientific discovery 2 2. Rowley, J. (2007). Wisdom hierarchy: Representations of the DIKW hierarchy. Journal of Information Science 3 3. Goldman, N. (2013). Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. 4 4. Gray, J. (1981). The transaction concept: virtues and limitations. In: VLDB '81: Proceedings of the seventh international conference on Very Large Data Bases 5 5. Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 7 6. Chen, P. (1976). The entity-relationship model: Toward a unified view of data. ACM Transactions on Database Systems 8 1. Gray, J. (2009). Jim Gray on eScience: A transformed scientific method. The Fourth Paradigm: Data-intensive scientific discovery Gray states that there is need to distinguish data-intensive science from computational science; he defines an emerging fourth paradigm for scientific exploration. This paradigm is derived from the deluge of data being produced within scientific research fields, and the necessity for tools which can be utilised within the whole research cycle; data capture, curation, analysis and visualisation. He identified that currently the data being produced is not being organised, or published in a systematic...

Words: 3050 - Pages: 13

Premium Essay

Lundberg

...KAROLINSKA INSTITUTET a medical university Development of a new tool for healthcare company analysis CMI Corporate Database Jonas Lundberg Supervisor Associate Professor Carl Johan Sundberg Stockholm 2001 Master thesis in medical science with a major in biomedicine Centre for Medical Innovations Karolinska Institutet Sweden Contents SUMMARY....................................................................................................................................3 ABBREVIATIONS........................................................................................................................4 DEFINITIONS ...............................................................................................................................5 PURPOSE OF THE THESIS .......................................................................................................8 INTRODUCTION .........................................................................................................................9 The biotechnology industry in Sweden ...............................................................................9 Technology Transfer at Universities ...................................................................................9 A corporate database as an analytical tool.........................................................................10 Thesis disposition ...........................................................................................

Words: 14471 - Pages: 58