Free Essay

Consistent Hashing

In:

Submitted By shrushti92
Words 1166
Pages 5
1. Consistent Hashing
Consider the following two scenarios. Describe in each case why consistent hashing is likely to perform better than hashing.
Scenario 1: There is a fixed set of cache servers implementing consistent hashing and a population of clients who have incomplete views of the system i.e. each client only knows about a fraction of the servers
Scenario 2: There is a set of cache servers that change i.e. nodes come and go.

Answer 1:
Hashing:
Hashing is an easy to implement and quick to evaluate algorithm. Let us consider that the fixed set of cache servers are ‘n’, that is the number of nodes. Let us number the computers 0, 1, 2,…,n-1. According to the hashing algorithm, the key value pair (k, v) will be stored on the cache ‘hash(k) mod n’ where hash() is any function that converts the arbitrary string k to a non-negative integer. Keys are distributed evenly in a cluster for any reasonable number of keys, if the hash function being used in hash(k) mod n is a good hash function.
Consistent Hashing:
Consistent hashing is a cleverer algorithm compared to hashing. Here, the output range of the hash function is treated as a ring or fixed circular space. The largest hash value wraps around to the smallest hash value to form the ring. Each node in the system is assigned a random value which represents its ‘position’ on the ring. Each data item identified by a key k is assigned to a node by hashing data item’s key to yield a position on the ring. We walk clockwise on the ring to find the first node with a position larger than the item’s position. If one of the cache servers is down, we go to the next one, and so on. In this way, each node is responsible for the region on the ring between it and its predecessor node on the ring.
Let us assume that we are wrapping the unit interval [0,1) onto a unit circle. So the cache servers 0,1,2,…,n-1 will be mapped onto this unit circle. If the hash function has range [0,R), then we rescale the hash function via k to hash(k)/R.
So we want to store a key value pair (k, v) on the ring. We hash the key onto the circle and then store the key value pair on the first server that appears clockwise on the key’s hash point. Roughly, 1/n of the key-value pairs will get stored on any single machine. This is because of the uniformity of the hash function.
Scenario 1:
Fixed set of cache servers implementing consistent hashing but the population of clients have incomplete views of the system.
In Hashing,
The hash function is fixed which is hash(k) mod n. So if the client does not know the hash function or does not have a view to all the nodes, the client will look for data on a server on which it does not exist. If the data cannot be fetched from other nodes due to lack of communication between the nodes, data will be fetched from the main database every time there is a new query from the client and the client is unaware of its location node. Thus this will cause one object to be assigned to a lot of different caches disturbing the ‘spread’ property of consistency. Similarly, over all the client views, the number of distinct objects assigned to a particular cache is large due to data replication thus causing the ‘load’ to increase. Time taken to fetch data will be high if the client is looking for data on the wrong cache.
In Consistent Hashing,
Each machine is aware of a constant fraction of the currently operating caches because of its ring topology. A client uses a consistent hash function to map an object to one of the caches in its view. The hash functions is constructed in such a way that even if the client is unware of the nodes, the hash value will lie in a particular range of values on the ring. Consistent hashing will follow a clockwise direction on the ring to locate the node with the value that the client is looking for. Here, even in the presence of inconsistent views, references for a given object are directed only to a small number of caching machines. Total number of different caches to which an object is assigned is small and the number of distinct objects assigned to a particular cache is small. Thus the ‘spread’ and ‘load’ properties of consistency are maintained. Time taken to fetch data will be very less compared to hashing.
Scenario 2:
Now when the set of cache servers is changing, the number of nodes will change. Say, if we add a node then n will become ‘n+1’ and if we remove a node then n will become ‘n-1’.
In Hashing,
Instead of computing hash(k) mod n, we will have to compute hash(k) mod (n+1) or hash(k) mod (n-1). This will cause key value pairs to be reallocated to new locations. These new locations will be completely random across the cluster. For a large n value, we will essentially require to move nearly all data to different servers.
The problem with this are:
1) Data moving process is slow and expensive.
2) Cache will be unavailable at the time when data is being moved.
3) Databases will experience traffic issues due to movement from DB to new cache.
4) The high amount of load on the database can make the entire system crash.
In Consistent Hashing,
The hash function being used is more consistent so as to maintain the ‘smoothness’, ‘spread’ and ‘load’ properties. We know that for a range [0,R) of the hash values, the hash function via k will be hash(k)/R. Thus, when a node is added or removed, only the data that needs to live on that node or that needs to be removed from that node respectively, has to be moved. All the other data of the system stays where it is. So when a new node is added, only the fraction of data that lies in it and its predecessor’s range on the ring will be moved to the new node. If a node is removed, all the data on that node will be moved to the next node on the ring in the anticlockwise direction.
The advantages are:
1) Only fraction of data objects need to be moved and not all.
2) The moving of data will take less time avoiding the cache and database availability issue.
3) Even in the presence of inconsistent views, references for a given object are directed only to a small number of caching machines.
4) Since data being moved is less, no one cache is assigned an unreasonable number of objects.

Similar Documents

Free Essay

Hashing

...Hashing hash functions collision resolution applications References: Algorithms in Java, Chapter 14 http://www.cs.princeton.edu/introalgsds/42hash 1 Summary of symbol-table implementations implementation unordered array ordered array unordered list ordered list BST randomized BST red-black tree guarantee search N lg N N N N 7 lg N 3 lg N insert N N N N N 7 lg N 3 lg N delete N N N N N 7 lg N 3 lg N search N/2 lg N N/2 N/2 1.39 lg N 1.39 lg N lg N average case insert N/2 N/2 N N/2 1.39 lg N 1.39 lg N lg N delete N/2 N/2 N/2 N/2 ? 1.39 lg N lg N ordered iteration? no yes no yes yes yes yes Can we do better? 2 Optimize Judiciously More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason including blind stupidity. - William A. Wulf We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. - Donald E. Knuth We follow two rules in the matter of optimization: Rule 1: Don't do it. Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfectly clear and unoptimized solution. - M. A. Jackson Reference: Effective Java by Joshua Bloch. 3 Hashing: basic plan Save items in a key-indexed table (index is a function of the key). Hash function. Method for computing table index from key. hash(“it”) = 3 ?? hash(“times”) = 3 0 1 2 3 4 5 “it” Issues. 1. Computing the hash function 2. Collision resolution:...

Words: 4332 - Pages: 18

Free Essay

Standford Search Techniques in Path Planning

...Practical Search Techniques in Path Planning for Autonomous Driving Sebastian Thrun Dmitri Dolgov AI & Robotics Group Toyota Research Institute Ann Arbor, MI 48105 ddolgov@ai.stanford.edu Michael Montemerlo James Diebel Computer Science Department Computer Science Department Computer Science Department Stanford University Stanford University Stanford University Stanford, CA 94305 Stanford, CA 94305 Stanford, CA 94305 diebel@stanford.edu mmde@ai.stanford.edu thrun@ai.stanford.edu Abstract We describe a practical path-planning algorithm that generates smooth paths for an autonomous vehicle operating in an unknown environment, where obstacles are detected online by the robot’s sensors. This work was motivated by and experimentally validated in the 2007 DARPA Urban Challenge, where robotic vehicles had to autonomously navigate parking lots. Our approach has two main steps. The first step uses a variant of the well-known A* search algorithm, applied to the 3D kinematic state space of the vehicle, but with a modified state-update rule that captures the continuous state of the vehicle in the discrete nodes of A* (thus guaranteeing kinematic feasibility of the path). The second step then improves the quality of the solution via numeric non-linear optimization, leading to a local (and frequently global) optimum. The path-planning algorithm described in this paper was used by the Stanford Racing Teams robot, Junior, in the Urban Challenge. Junior demonstrated...

Words: 4106 - Pages: 17

Free Essay

Algorithm

...1. Illustrate the operation of Radix_sort on the following list of English words: cow, dog, seq, rug, row, mob, box tab, bar ear, tar, dig, big, tea, now, fox. ANSWER: It is a sorting algorithm that is used to sort numbers. We sort numbers from least significant digit to most significant digit. In the following array of words, three is the maximum number of digits a word has, hence the number of passes will be three. In pass 1, sort the words alphabetically using first letter from the right. For eg, tea has “a” as the last letter, hence it comes first, similarly mob which has “b” as the last letter comes second. In this way the remaining words are sorted. In pass 2, sort the words alphabetically using second letter from the right. For eg, tab has “a” as its middle letter which comes first, then comes bar and so on. In pass 3, sort the words alphabetically using third letter from the right. For eg, bar has “b” as its first letter from left and since no word starts with “a”, bar will appear first. Similarly, big, box, cow and so on. UNSORTED ARRAY | PASS 1 | PASS 2 | PASS 3(SORTED ARRAY) | cow | tea | tab | bar | dog | mob | bar | big | seq | tab | ear | box | rug | rug | tar | cow | row | dog | tea | dig | mob | dig | seq | dog | box | big | dig | ear | tab | seq | big | fox | bar | bar | mob | mob | ear | ear | dog | now | tar | tar | cow | row | dig | cow | row | rug | ...

Words: 1470 - Pages: 6

Free Essay

House Price Data in Iowa

...Report on the Factors influencing house prices in Ames, Iowa. Name: Michelle O’ Regan Student number: 114462288 Degree: BSc Finance. Second Year Word Count: 1822 (not including appendix) Submission Date: 14th April, 2016 Introduction Throughout this report I endeavour to present a clear, concise documentation of the factors that influence house prices in Ames, Iowa. I will initiate this report with my estimate of the possible regression based on my economic theory, create a dummy variable in respect to the absence/presence of a garage, followed by a comprehensive description of continuous and discrete variables. Preceding this I aim to report an extensive description of my estimated regression, test said regression for multicollinearity and heteroscedasticity, predict possible solutions to these problems and re run the regression taking into consideration the possible solutions. Main Body Part (a) From my study of econometrics and my knowledge of house prices, the following equation is my estimate of the factors that influence the price of houses PR= f (SI, YD, GA, lnAGE) + + + - (see appendix 1.1 for variable details) My reasoning for the inclusion of the above variables and their predicted signs are as follows: SI: Generally speaking, the larger the home the more you pay as house buyers like to buy houses with as much...

Words: 3224 - Pages: 13

Free Essay

It, Network

...CS 143 Final Exam Notes Disks A typical disk ▪ Platter diameter: 1-5 in ▪ Cylinders: 100 – 2000 ▪ Platters: 1 – 20 ▪ Sectors per track: 200 – 500 ▪ Sector size: 512 – 50K ▪ Overall capacity: 1G – 200GB ❖ ( sectors / track ) ( ( sector size ) ( ( cylinders ) ( ( 2 ( number of platters ) Disk access time ▪ Access time = (seek time) + (rotational delay) + (transfer time) ❖ Seek time – moving the head to the right track ❖ Rotational delay – wait until the right sector comes below the head ❖ Transfer time – read/transfer the data Seek time ▪ Time to move a disk head between tracks ❖ Track to track ~ 1ms ❖ Average ~ 10 ms ❖ Full stroke ~ 20 ms Rotational delay ▪ Typical disk: ❖ 3600 rpm – 15000 rpm ❖ Average rotational delay ➢ 1/2 * 3600 rpm / 60 sec = 60 rps; average delay = 1/120 sec Transfer rate ▪ Burst rate ❖ (# of bytes per track) / (time to rotate once) ▪ Sustained rate ❖ Average rate that it takes to transfer the data ❖ (# of bytes per track) / (time to rotate once + track-to-track seek time) Abstraction by OS ▪ Sequential blocks – No need to worry about head, cylinder, sector ▪ Access to random blocks – Random I/O ▪ Access to consecutive blocks – Sequential I/O Random I/O vs. Sequential I/O ▪ Assume ❖ 10ms seek time ❖ 5ms rotational delay ❖ 10MB/s transfer rate ❖ Access time = (seek time) + (rotational delay) + (transfer...

Words: 6614 - Pages: 27

Free Essay

Cassandra

...Cassandra - A Decentralized Structured Storage System Summary Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. Cassandra system was designed to run on cheap commodity hardware and handle high write throughput while not sacrificing the read efficiency. Facebook is the largest social media platform which serves hundreds of millions users. There are strict operational requirements on Facebook's platform in terms of performance, reliability and efficiency, and to support continuous growth the platform needs to be highly scalable. Cassandra has techniques to achieve scalability and availability. Cassandra was designed to fulfill the storage needs of the Inbox Search problem. This enables users to search through Facebook inbox. Main operational requirements were; • Performance • Dealing with failures • Reliability ...

Words: 1011 - Pages: 5

Free Essay

Memcached Handbook

...idv2.com memcached 全面剖析 长野雅广、前坂徹著 charlee 译 版本 1.0 1 idv2.com 目 录 译者序..................................................................................................................................................4 第 1 章 memcached 的基础.................................................................................................................5 1.1  memcached 是什么?...............................................................................................................5 1.2  memcached 的特征...................................................................................................................6 协议简单.....................................................................................................................................6 基于 libevent 的事件处理..........................................................................................................6 内置内存存储方式.....................................................................................................................6 memcached 不互相通信的分布式.............................................................................................6 1.3  安装 memcached.......................................................................................................................7 memcached 的安装.....................................................................................................................7 memcached 的启动................................................................................

Words: 2678 - Pages: 11

Free Essay

References

...C Ike Antkare’s publications [10] Ike Antkare. Analysis of reinforcement learning. In Proceedings of the Conference on Real-Time Communication, February 2009. [11] Ike Antkare. Analysis of the Internet. Journal of Bayesian, Event-Driven Communication, 258:20–24, July 2009. [12] Ike Antkare. Analyzing interrupts and information retrieval systems using begohm. In Proceedings of FOCS, March 2009. [13] Ike Antkare. Analyzing massive multiplayer online role-playing games using highlyavailable models. In Proceedings of the Workshop on Cacheable Epistemologies, March 2009. [14] Ike Antkare. Analyzing scatter/gather I/O and Boolean logic with SillyLeap. In Proceedings of the Symposium on Large-Scale, Multimodal Communication, October 2009. [15] Ike Antkare. Architecting E-Business Using Psychoacoustic Modalities. PhD thesis, United Saints of Earth, 2009. [16] Ike Antkare. Bayesian, pseudorandom algorithms. In Proceedings of ASPLOS, August 2009. [17] Ike Antkare. BritishLanthorn: Ubiquitous, homogeneous, cooperative symmetries. In Proceedings of MICRO, December 2009. [18] Ike Antkare. A case for cache coherence. Journal of Scalable Epistemologies, 51:41–56, June 2009. [19] Ike Antkare. A case for cache coherence. In Proceedings of NSDI, April 2009. [20] Ike Antkare. A case for lambda calculus. Technical Report 906-8169-9894, UCSD, October 2009. [21] Ike Antkare. Comparing von Neumann machines and cache coherence. Technical Report 7379, IIT, November 2009. [22]...

Words: 1850 - Pages: 8

Free Essay

No Sql Databases

...INSY 5337 Data Warehousing – Term Paper NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and Cassandra Authored ByNitin Shewale Aditya Kashyap Akshay Vadnere Vivek Adithya Aditya Trilok Abstract Data volumes have been growing exponentially in recent years, this increase in data across all the business domains have played a significant part in the analysis and structuring of data. NoSQL databases are becoming popular as more organizations consider it as a feasible option because of its schema-less structure along with its capability of handling BIG Data. In this paper, we talk about various types of NoSQL databases based on implementation perspective like key store, columnar and document oriented. This research paper covers the consolidated applied interpretation of NoSQL system, depending on the various database features like security, concurrency control, partitioning, replication, Read/Write implementation. We also would draw out comparisons among the popular products and recommend a particular NoSQL solution on the above mentioned factors. 1. Introduction Until recently, Relational database systems have been on the forefront of data storage and management operations. The advent of mobile applications that requires real time analysis like GPS based services, banking and social media has led to huge unstructured data being produced every second. Traditional RDBMS systems have found it difficult to cater to these huge chunks of unstructured...

Words: 4246 - Pages: 17

Premium Essay

Expanded Cia Triangle

..."This task was originally submitted during the 1301A session in CSS150-03 with  Donald Wilcoxen."  Categories of the Expanded C.I.A Triangle Jason Snyder Colorado Technical University CSS150-1302A-01 Introduction to Computer Security Phase 1 IP Instructor: Gregory Roby April 15, 2013 Information in IT security is a valuable resource and asset. The value of the information from the characteristics it possesses cause appreciation or depreciation for the user of the information, In IT security there is seven characteristics for information to be considered valuable and secure. The expanded C.I.A triangle was created to explain those characteristics in more detail. The seven characteristics for the triangle are Availability, Accuracy, Authenticity, Confidentiality, Integrity, Utility, and Possession. Availability Availability is a characteristic of making information accessible to person or computer system without interference or obstruction, as well as receiving the information in a required format. A good example of using availability is going to an ATM to deposit or with draw money. The ATM is available to all users that can verify that they have an account that contains enough funds to complete a transaction. The machine is made available with the use of magnetic card that has the user’s account information stored on it. When it comes to a computer or similar device availability to information mainly made is thru the use of user log-ins with a password. Accuracy ...

Words: 921 - Pages: 4

Free Essay

Decoupling the Univac Computer from Spreadsheets in Redundancy

...encryption, which embodies the confusing principles of e-voting technology. In order to fix this issue, we concentrate our efforts on demonstrating that Smalltalk and the location-identity split can collaborate to surmount this problem. the analysis of object-oriented languages. Furthermore, we validate that IPv6 and lambda calculus can interfere to surmount this riddle. The rest of this paper is organized as follows. We motivate the need for Boolean logic. Furthermore, to fulfill this objective, we concentrate our efforts on disproving that extreme programming and IPv7 can interact to answer this problem. To achieve this objective, we motivate a novel application for the investigation of compilers (Cutch), which we use to disconfirm that consistent hashing can be made robust, “smart”, and optimal. Ultimately, we conclude. 1 Introduction Many biologists would agree that, had it not been for Smalltalk, the improvement of gigabit switches might never have occurred. The influence on distributed cryptoanalysis of this has been adamantly opposed. Next, two properties make this method perfect: our heuristic requests symbiotic communication, without providing the UNIVAC computer, and also Cutch emulates the exploration of checksums. The construction of robots would minimally degrade empathic information. We describe an application for local-area networks (Cutch), which we use to confirm that SCSI disks can be made peer-to-peer, low-energy, and optimal. Without a doubt, this is a direct...

Words: 2196 - Pages: 9

Free Essay

A Case for Expert Systems

...A Case for Expert Systems Abstract Recent advances in classical modalities and perfect information are generally at odds with 802.11 mesh networks. After years of unfortunate research into symmetric encryption, we show the evaluation of DHCP. Taille, our new system for modular modalities, is the solution to all of these challenges [1]. Table of Contents 1 Introduction The implications of extensible symmetries have been far-reaching and pervasive. In fact, few researchers would disagree with the investigation of neural networks, which embodies the extensive principles of robotics [1,2]. Despite the fact that such a claim at first glance seems counterintuitive, it is derived from known results. To what extent can forward-error correction be refined to accomplish this aim? We concentrate our efforts on arguing that sensor networks and telephony can connect to achieve this purpose. It should be noted that Taille is derived from the principles of operating systems. Indeed, hash tables and the producer-consumer problem have a long history of interacting in this manner. We view cyberinformatics as following a cycle of four phases: management, study, location, and exploration. Thus, we demonstrate that the famous modular algorithm for the evaluation of congestion control by Robinson et al. follows a Zipf-like distribution. Our contributions are as follows. We investigate how XML can be applied to the synthesis of multi-processors. We describe an analysis of Scheme...

Words: 2476 - Pages: 10

Premium Essay

Cmgt 400 Intro to Information Assurance & Security

...Introduction These past few years have been distinct by several malicious applications that have increasingly targeted online activities. As the number of online activities continues to grow strong, ease of Internet use and increasing use base has perfected the criminal targets. Therefore, attacks on numerous users can be achieved at a single click. The methods utilized in breaching Internet security vary. However, these methods have increasingly become complicated and sophisticated over time. With the increase in threat levels, stronger legislations are being increasingly issued to prevent further attacks. Most of these measures have been aimed at increasing the security of Internet information. Among these methods, the most prominent approach is security authentication and protection. This paper comprehensively evaluates the security authentication process. The paper also introduces security systems that help provide resistance against common attacks. Security Authentication Process Authentication is the process that has increasingly been utilized in verification of the entity or person. Therefore, this is the process utilized in determining whether something or someone is what it is declared to be (LaRoche, 2008). Authentication hence acts as part of numerous online applications. Before accessing an email account, the authentication process is incorporated in identification of the foreign program. Therefore, the most common authentication application is done through incorporation...

Words: 1123 - Pages: 5

Premium Essay

Good

...AC14/AT11 Database Management Systems TYPICAL QUESTIONS & ANSWERS PART -I OBJECTIVE TYPE QUESTIONS Each Question carries 2 marks. Choosethe correct or the best alternative in the following: Q.1 Which of the following relational algebra operations do not require the participating tables to be union-compatible? (A) Union (B) Intersection (C) Difference (D) Join Ans: (D) Q.2 Which of the following is not a property of transactions? (A) Atomicity (B) Concurrency (C) Isolation (D) Durability Ans: (B) Q.3 Relational Algebra does not have (A) Selection operator. (C) Aggregation operators. (B) Projection operator. (D) Division operator. Ans: (C ) Q.4 Checkpoints are a part of (A) Recovery measures. (C ) Concurrency measures. (B) Security measures. (D) Authorization measures. Ans: (A) Q.5 Tree structures are used to store data in (A) Network model. (B) Relational model. (C) Hierarchical model. (D) File based system. Ans: (C ) Q.6 The language that requires a user to specify the data to be retrieved without specifying exactly how to get it is (A) Procedural DML. (B) Non-Procedural DML. (C) Procedural DDL. (D) Non-Procedural DDL. Ans: (B) Q.7 Precedence graphs help to find a 1 AC14/AT11 Database Management Systems (A) Serializable schedule. (C) Deadlock free schedule. (B) Recoverable schedule. (D) Cascadeless schedule. Ans: (A) Q.8 The rule that a value of a foreign key must appear...

Words: 20217 - Pages: 81

Premium Essay

Test Paper

...CompTIA Security+: Get Certified Get Ahead SY0-401 Study Guide Darril Gibson Dedication To my wife, who even after 22 years of marriage continues to remind me how wonderful life can be if you’re in a loving relationship. Thanks for sharing your life with me. Acknowledgments Books of this size and depth can’t be done by a single person, and I’m grateful for the many people who helped me put this book together. First, thanks to my wife. She has provided me immeasurable support throughout this project. The technical editor, Steve Johnson, provided some good feedback throughout the project. If you have the paperback copy of the book in your hand, you’re enjoying some excellent composite editing work done by Susan Veach. I’m extremely grateful for all the effort Karen Annett put into this project. She’s an awesome copy editor and proofer and the book is tremendously better due to all the work she’s put into it. While I certainly appreciate all the feedback everyone gave me, I want to stress that any technical errors that may have snuck into this book are entirely my fault and no reflection on anyone who helped. I always strive to identify and remove every error, but they still seem to sneak in. About the Author Darril Gibson is the CEO of YCDA, LLC (short for You Can Do Anything). He has contributed to more than 35 books as the sole author, a coauthor, or a technical editor. Darril regularly writes, consults, and teaches on a wide variety of technical...

Words: 125224 - Pages: 501