Free Essay

Rock Algorithm

In:

Submitted By DonOzone
Words 838
Pages 4
Introduction Clustering in data mining, is useful in discovery of distribution patterns in underlying data. Our interest is in clustering based on non-numerical data-categorical or Boolean attributes. An example of hierarchical clustering algorithm used in sample data is ROCK (RObust Clustering using linKs). The clustering technique is useful for grouping data points such that a single group or cluster have similar characteristics while different groups are dissimilar. ROCK belongs to the class of agglomerative hierarchical clustering algorithms. OCK algorithm has mainly 3 steps namely, ‘Draw random sample’, ‘Cluster with links’, ‘Label data in disk’ the steps are described in the following diagram: ROCK’s hierarchical algorithm accepts as input the set S of N sample points to be clustered, and the number of desired clusters K. The first step in the procedure is to compute the number of links between pairs of points. Initially each point is separate cluster. For each cluster i, we build a local heap q[i] and maintain the heap during the execution of the algorithm. Q[i] contains every cluster j such that link[i,j] is non-zero. The clusters j in q[i] are ordered in the decreasing order of the goodness measure with respect to I, g(i,j). In addition to the local heaps q[i] for each cluster I, the algorithm also maintains an additional global heap q that contains all the clusters. Furthermore, the clusters in q are ordered in the decreasing order of their best goodness measures. Thus, g(j, max(q[j])) is used to order the various clusters j in q, where max(q[j]), the max element in q[j], is the best cluster to merge with cluster j. At each step, the max cluster j in q and the max cluster q[j[ are the best pair of clusters to be merged. Example program in R is as follows: For every point, after computing a list of its neighbors, the algorithm considers all pairs of its neighbors. For each pair, the point contributes one link. If the process is repeated for every point and the link count is incremented for each pair of neighbors, then at the end, the link counts for all pairs of points will be trained. If M¬I is the size of the neighbor list for point I, then for point I, we have to increase the link count by one in M2i entries. This, the complexity of the algorithm is the sum of M2i which is O(N*Mm*Ma), where Ma and Mm are the average and maximum number of the neighbors for a point, respectively. In the worst case, the value of Mm can be n in which case the complexity of the algorithm becomes O (Ma*N2). In practice, we expect Mm to be reasonably close to Ma and thus, for these cases, the complexity of the algorithm reduces to O (M2a*n) on average. ROCK performs agglomerative hierarchical clustering and explores the concept of links for data with categorical attributes. The various attributes include:
a. Links: These are the number of common neighbors between two objects.
b. Neighbors: If similarity between two points exceeds certain similarity threshold they are neighbors.
c. Criterion function: The objective is o maximize the criterion function to get the good quality clusters. By maximizing we mean maximizing the sum of links of intra cluster point pairs while minimizing the sum of links of inter cluster point pairs.
A given real life application of ROCK algorithm is in congressional votes. It is the US Congressional Voting records in 1984. Each record corresponds to one congress man’s votes on 16 issues. All attributes are Boolean values, and very few contain missing values. A classification label of Republican or Democratic is provided with each data record. The data set contains records for 168 republicans and 267 Democrats.
The following are the general applications of clustering:
1. Pattern recognition
2. Spatial data analysis:
– create thematic maps in GIS by clustering feature spaces
– detect spatial clusters and explain them in spatial data mining
3. Image Processing
4. Economic Science (especially market research)
5. World Wide Web:
– Document classification
– Cluster Weblog data to discover groups of similar access patterns
Examples of clustering applications include:
1. Software clustering: cluster files in software systems based on their functionality
2. Intrusion detection: Discover instances of anomalous (intrusive) user behavior in large system log files
3. Gene expression data: Discover genes with similar functions in DNA microarray data.
4. Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs
5. Land use: Identification of areas of similar land use in an earth observation database
6. Insurance: Identifying groups of motor insurance policy holders with a high average claim cost
7. City-planning: Identify groups of houses according to their house type, value, and geographical location.
8. Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults.
9. Biology: plant and animal taxonomies, genes functionality

Similar Documents

Premium Essay

Computer Science Illuminated

...N E L L D A L E J O H N L E W I S illuminated computer science J O N E S A N D B A RT L E T T C O M P U T E R S C I E N C E computer science illuminated N E L L D A L E J O H N L E W I S computer science illuminated N E L L D A L E J O H N Villanova University L E W I S University of Texas, Austin Jones and Bartlett Publishers is pleased to provide Computer Science Illuminated’s book-specific website. This site offers a variety of resources designed to address multiple learning styles and enhance the learning experience. Goin’ Live This step-by-step HTML Tutorial will guide you from start to finish as you create your own website. With each lesson, you’ll gain experience and confidence working in the HTML language. Online Glossary We’ve made all the key terms used in the text easily accessible to you in this searchable online glossary. The Learning Store Jones and Bartlett Publishers has a wealth of material available to supplement the learning and teaching experience. Students and instructors will find additional resources here or at http://computerscience. jbpub.com The Language Library Here you will find two complete chapters that supplement the book’s language-neutral approach to programming concepts. A JAVA language chapter and C++ language chapter are included and follow the same pedagogical approach as the textbook. http://csilluminated.jbpub.com eLearning Our eLearning center provides chapter-specific...

Words: 67693 - Pages: 271

Premium Essay

Pdf. Input Out Files

...something what we call System Analysis and Design programmers do to understand a problem. Many diagrams including "Work Break Down Structure", "Workflow Diagram" and "Class Diagrams" are some of the most common ones are used. Question 2. What is Pseaudocode? Pseudocode is an informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading. Pseudocode typically omits details that are not essential for human understanding of the algorithm, such as variable declarations, system-specific code and some subroutines. The programming language is augmented with natural language description details, where convenient, or with compact mathematical notation. The purpose of using pseudocode is that it is easier for people to understand than conventional programming language code, and that it is an efficient and environment-independent description of the key principles of an algorithm. It is commonly used in textbooks and scientific publications that are documenting various algorithms, and also in planning of computer program development, for sketching out the structure of the program before the actual coding takes place. Question 3 computer programmers normally perform what 3 steps? 1. Input is received. 2. Some process is performed on the input. 3. Output is produced. Question 4 What does user friendly mean? 2. user friendly"...

Words: 330 - Pages: 2

Free Essay

Initial Sizing of an Airplane

...CONSTRAINT ANALYSIS The two parameters (i) wing loading and (ii) power loading are the most important parameters that affect the airframe design and its performance. Wing loading and power loading are interconnected for a number of design parameters and because of this interconnection it is difficult to use historical data to independently select these values. Hence a different approach needs to be followed. The team decided to use sizing matrix plots to find the optimum value of the two parameters. For calculating the optimum values, the most important design requirements must be decided. The design requirements that have the highest weightage are: (i) (ii) (iii) (iv) Stall speed Take off distance Landing distance Sustained turn All the above design requirements were written in terms of wing loading and power loading. Since in the design requirements parameters are given in the inequality form, hence a region of point is obtained instead of single point of intersection. This region of favorable points is known as design space. On putting values of all parameters and then plotting in matlab, design space is obtained. The code for plotting the sizing matrix plot is given in Appendix A(a). In the design space the optimum design point is obtained by calculating the score of the point based on its weightage. (The weightage of various design parameters is given in Appendix A(b). REGRESSION ANALYSIS A regression analysis was done to find out the statistical relation between...

Words: 582 - Pages: 3

Premium Essay

Cause and Effect of Computer Revolution

...ENC 1101 February 13 2013 Causes and Effects of the Computer/Information Revolution By illustrating the lifestyle of the computer revolution through advancements in human society whether it be medicine, school or businesses, computers paints a vivid image of a world that is interconnected providing further advancement upon our society; only to create a bigger, faster and more efficient world. The 21st century is known as the information and or computer revolution. As Hamming states, "the industrial revolution released man from being a beast of burden; computer revolution will similarly release him from slavery to dull, repetitive routine" (Hamming 4). The revolution began after World War II and to this day continues evolving at a rapid pace enhancements made to it’s speed and size, has led to more and more information being found and processed on a daily basis.  According to Linowes, more information has been produced in the last thirty years than in the previous five thousand. Changes to the lifestyle of the everyday human are quite prevalent. Nowadays, people have access to what seems to be an endless pool of information, whether it is social networks, instant messaging, electronic libraries while businesses use the internet and information technology to operate their organizations. People communicate every day and transfer data every day at an alarming rate. The computer revolution has shaped or current environment into one where the internet is central to todays society...

Words: 1488 - Pages: 6

Free Essay

How to Write a Computer Program

...need to determine how you are going to take your input information and turn it into your output information. An example problem is that you want to determine the price of items before and after tax. Your inputs would be: the price of the item, expressed as ItemPrice; the amount of tax, expressed as TaxRate; and the amount of that item, expressed as ItemQuantity. The output would then be the amount of the item before and after tax has been included, expressed as OriginalPrice and TaxPrice respectively. One way to solve this would be to use the following equations: ItemPrice * ItemQuantity = OriginalPrice and then OriginalPrice * TaxRate + OriginalPrice = TaxPrice. Once the problem has been analyzed, the variables identified, and the algorithm has been determined, it is time to design the program. Designing the program is no more than creating a set of step by step instructions,...

Words: 1031 - Pages: 5

Free Essay

Selection Paper

...Selection Structure Paper Given the following task: Selection Structure Paper, Use the Part 1: Programming Solution Proposal you developed in Week Two and select one section of the proposal that requires a selection structure. Write a 2- to 3-page paper describing the purpose of that structure and write the pseudocode for that structure. Examine any iteration control structure. If the program you described in Week Two does not lend itself well to the inclusion of a selection structure, create a new example of a selection structure. Create a Visual Logic flowchart that parallels this pseudocode. Test the flowchart to make sure that it executes properly and produces correct results. Submit the paper and the Visual Logic file. Format your paper consistent with APA guidelines. The process of selection is a way for the computer to interact with the user and to be able to understand how to make choices based on the user’s point of view or interest. Selection can be understood by computers by transforming such selections into algebraic equations, and from there into binary code which is the language that the computer understands, once the program is written, it will use a compilator, which acts as the translator between computer language and human language. The process of selection allows the user to choose what to do and then it gives options where to choose from, and it gives results which vary depending on the option selected by the user, when using the process of selection...

Words: 554 - Pages: 3

Premium Essay

Essay

...I like to think that perfection is an illusion. In that, it is an unattainable quest which we pursue in order to overcome our own shortcomings and flaws. It would be easy to mistake it for a futile or flawed endeavor in itself, but if there is anything that I have learned during the four years in college, it is that perfection is the only goal worth pursuing. This is appropriate more so because, as a student of computer science, we are on the never ending road to create algorithms that are not just simpler and faster but also pure in essence, for perfection is purity. Just as perfection is a lifelong goal, so is the pursuit of knowledge. My pursuit of computer science began way before I even went to college to take up a Bachelor’s Honors Degree in the same, the difference being that, as a kid I was driven by curiosity and now by passion. It was during my time in college that I was introduced to Open Source Software, a concept not just perfect as an ideal but also as a functioning system. Having always worked with windows till then, Linux seemed like a whole new world and I was easily enthralled by it. I guess that is how my passion for operating systems began. Having said that, I was eager to study and program and I have always paid special attention to the important courses such as ‘Computer Programming 1, 2’ , ‘Programming Languages & Compiler Construction’ , ‘Operating Systems’ and ‘Software Engineering’. These are courses that have helped me to understand the basics, at...

Words: 284 - Pages: 2

Premium Essay

It- 3rd Year

...E-COMMERCE (TIT-501) UNIT I Introduction What is E-Commerce, Forces behind E-Commerce Industry Framework, Brief history of ECommerce, Inter Organizational E-Commerce Intra Organizational E-Commerce, and Consumer to Business Electronic Commerce, Architectural framework Network Infrastructure for E-Commerce Network Infrastructure for E-Commerce, Market forces behind I Way, Component of I way Access Equipment, Global Information Distribution Network, Broad band Telecommunication. UNIT-II Mobile Commerce Introduction to Mobile Commerce, Mobile Computing Application, Wireless Application Protocols, WAP Technology, Mobile Information Devices, Web Security Introduction to Web security, Firewalls & Transaction Security, Client Server Network, Emerging Client Server Security Threats, firewalls & Network Security. UNIT-III Encryption World Wide Web & Security, Encryption, Transaction security, Secret Key Encryption, Public Key Encryption, Virtual Private Network (VPM), Implementation Management Issues. UNIT - IV Electronic Payments Overview of Electronics payments, Digital Token based Electronics payment System, Smart Cards, Credit Card I Debit Card based EPS, Emerging financial Instruments, Home Banking, Online Banking. UNIT-V Net Commerce EDA, EDI Application in Business, Legal requirement in E -Commerce, Introduction to supply Chain Management, CRM, issues in Customer Relationship Management. References: 1. Greenstein and Feinman, “E-Commerce”, TMH 2. Ravi Kalakota, Andrew Whinston...

Words: 2913 - Pages: 12

Premium Essay

Solving Reader Collision Problem in Large Scale Rfid Systems

...problem in large scale RFID systems : Algorithms, performance evaluation and discussions John Sum, Kevin Ho, Siu-chung Lau Abstract—Assigning neighboring RFID readers with nonoverlapping interrogation time slots is one approach to solve the reader collision problem. In which, Distributed Color Selection (DCS) and Colorwave algorithm have been developed, and simulated annealing (SA) technique have been applied. Some of them (we call them non-progresive algorithms), like DCS, require the user to pre-defined the number of time slots. While some of them (we call them progressive), like Colorwave, determine the number automatically. In this paper, a comparative analysis on both non-progressive and progressive algorithms to solve such a problem in a random RFID reader network is presented. By extensive simulations on a dense network consisting of 250 readers whose transmission rates are 100%, a number of useful results have been found. For those non-progressive type algorithms, it is found that DCS is unlikely to generate a collision-free solution, even the number of time slots is set to 20. On the other hand, heuristic and SAbased algorithms can produce collision-free solutions whenever the number of time slots is set to 16. For the cases when the number of time slots is not specified, heuristic-based, SAbased and Colorwave algorithms are all able to determine the number automatically and thus generate collision-free solution. However, SA-based algorithms require much longer time than the...

Words: 6608 - Pages: 27

Premium Essay

Program Design and Tools

...PROGRAM DESIGN TOOLS Algorithms, Flow Charts, Pseudo codes and Decision Tables Designed by Parul Khurana, LIECA. Introduction • The various tools collectively referred to as program design tools, that helps in planning the program are:– Algorithm. – Flowchart. – Pseudo-code. Designed by Parul Khurana, LIECA. Algorithms • An algorithm is defined as a finite sequence of instructions defining the solution of a particular problem, where each instruction is numbered. • However, in order to qualify as an algorithm, every sequence of instructions must satisfy the following criteria: Designed by Parul Khurana, LIECA. Algorithms • Input: There are zero or more values which are externally supplied. • Output: At least one value is produced. • Definiteness: Each step must be clear and unambiguous, i.e., having one and only one meaning. • Finiteness: If we trace the steps of an algorithm, then for all cases, the algorithm must terminate after a finite number of steps. Designed by Parul Khurana, LIECA. Algorithms • Effectiveness: Each step must be sufficiently basic that it can in principle be carried out by a person using only one paper and pencil. – In addition, not only each step is definite, it must also be feasible. Designed by Parul Khurana, LIECA. Formulation of Algorithm • Formulate an algorithm to display the nature of roots of a quadratic equation of the type: ax2 + bx + c = 0 provided a ≠ 0 Designed by Parul Khurana, LIECA. Formulation...

Words: 914 - Pages: 4

Premium Essay

Calculating the Window of Vulnerability

...To calculate the window of vulnerability (WOV) we will first need to know the amount of time It will take to get a working solution. In this case, we need a patch to solve the issue. We already know that it will take Microsoft 3 days to get a patch out to us. So, we can start with three days. After that, we need time to test the patch, and publish it out to the active directory update servers. This will usually take a few days according to the book. After it is all tested on the equipment, we need to push out the update to all of the client computers and servers. This will usually take a day or so. Also, depending on if the IT staff works on the weekends to solve the problem that will add another two days to fix the problem. So, to add it up, It takes three days to get the patch, Up to five days to test the patch, and another day or two to publish the patch out to all of the client computers. All in total, this will take around a week to solve this issue. My personal opinion is any IT personal that takes a WEEK to solve a major security breach should be fire. Personally, I would put immediate measures in place to solve the issue such as blocking the mac address, immediately writing scripts and programs to detect intrusions in the hole, and block out the attacker. Taking more than a day or two for testing is major overkill for fixing a major hole. But, that is my...

Words: 273 - Pages: 2

Premium Essay

Transforming Data Into Information

...Transforming Data into Information What is Data? What is information? Data is facts; numbers; statistics; readings from a device or machine. It depends on what the context is. Data is what is used to make up information. Information could be considered to be the same characteristics I just described as data. In the context of transforming data into information, you could assume data is needed to produce information. So information there for is the meaningful translation of a set of or clusters of data that’s produces an output of meaningful information. So data is a bunch of meaningless pieces of information that needs to be composed; analyzed; formed; and so forth to form a meaningful piece of information. Transforming Data Let’s pick a context such as computer programming. You need pieces of data to be structured and formed into something that will result in an output of something; a message, a graph, or a process, in which a machine can perform some sort of action. Well now we could say that information is used to make a product, make a computer produce something, or present statistical information. That would be the output of that data. The data would be numbers, words, or symbols. The information would be a message, a graph, or a process, in which a machine can perform some sort of action. Information Information could be looked at as data as well. Let’s say we need a chart showing the cost of a business expenses in relation to employee salaries. The data for showing...

Words: 315 - Pages: 2

Free Essay

Algorithms and Logic for Computer Programming

...Personal Learning Management University of Phoenix Algorithms and Logic for Computer Programming PRG 211 Professor Sam March 07, 2013 Personal Learning Management Being able to develop a management tool that would allow a user or student to review course material would be very beneficial. With a course such as programming that has so much information, it is important to be able to recall information in order to properly understand how programming works. I for example, do not have any prior knowledge of so I would have to continuously refresh the information that I have learn in the reading as well as in the class room environment. I will be discussing some topics that are important to the development of such a program. In order to properly develop an application, we must first address and analyze the problem that has caused this need. In this situation, we want to design an application that will allow students to be able to review reading assignments as well as task or anything that would be beneficial to retain. Some subjects are a harder to remember than others such as programming. Modular programming would be the best fit because we would want everyone to read the material in the same order. We would set up the program so everyone’s view is the same. If we allow people to “jump around” in the programming, some learning material is going to be skipped over and that would defeat the purpose of the development of this application. Submodules would be added...

Words: 480 - Pages: 2

Premium Essay

Live Project

...The information technology course module has been designed with more of software part in the course whereas Computer Science includes more of computer hardware part like networking, chip level knowledge etc. Although some of the subjects are same in both the streams.  Answer Information Technology is the business side of computers - usually dealing with databases, business, and accounting. The cs engineering degree usually deals with how to build micro processors, how to write a compiler, and is usually more math intensive than IT. One way to think of it is one is dealing with information - data which would be the IT and the other is dealing with the "science" or "how to make it" of computers.   Answer    The exact answer depends heavily on the college or university in question, as each tends to split things slightly differently. As a generalization, there are actually three fields commonly associated with computers:  Information Technology - this sometimes also goes by the names "Information Systems", "Systems Administration", or "Business Systems Information/Administration". This is a practical engineering field, concerned primarily with taking existing hardware and software components and designing a larger system to solve a particular business function. Here you learn about some basic information theory, applied mathematics theory, and things like network topology/design, database design, and the like. IT concerns itself with taking building blocks such...

Words: 490 - Pages: 2

Premium Essay

Cmoputer

...Programming Development Select and complete one of the following assignments: Option 1: Programming Solution Option 2: Personal Learning Management Option 1: Programming Solution Part 1: Programming Solution Proposal Select a problem in your workplace that requires a programming solution. Instead of a workplace, you may use another organization to which you belong, such as a house of worship, a local library, or a sports league. You may also use one of the Virtual Organizations as your model. Write a 2- to 3-page proposal in which you do the following: • Describe how you determined the problem that must be solved. • Describe the role of the personnel involved in the project. • Explain the process of solving the problem and developing the program in terms of the programming development cycle. • Explain how you would take a modular approach to the program solution and why it is important. • Provide appropriate references to support the points in your paper. Format your paper consistent with APA guidelines. Part 2: Selection Structure Paper Use the Part 1: Programming Solution Proposal you developed in Week Two and select one section of the proposal that requires a selection structure. Write a 2- to 3-page paper describing the purpose of that structure and write the pseudocode for that structure. Examine any iteration control structure. If the program you described in Week Two does not lend itself well to the inclusion of a selection...

Words: 972 - Pages: 4