Free Essay

Distributed System Failures

In:

Submitted By vwhite22
Words 1102
Pages 5
Victoria White
Distributed System Failure
December 16, 2013

There are two types of system structures that can be created. The first is a centralized system, which consists of one or more major hubs. All communication is processed through these hubs. This system setup provides security, to an extent, since all of the computing is done through a single computer. However, it also creates a single point of failure, if the main computer goes down the system is down. A distributed system is a collection of processors connected by a communication network. The processors may include microprocessors, workstations, minicomputers, and large computer systems. These processors are known by a few different names, sites, hosts, nodes, computers, and machines. There are a couple major reasons for creating a distributed system, these reasons include resource sharing, communication, reliability, and computation speedup. However, there are a few failures that may occur with a distributed system these failures include link failure, host failure, storage media failure, and scalability.
The first failure, link failure, occurs when the connection between two parts of the system fails. When this takes type of failure takes place the two parts of the system connecting can no longer communicate with each other. To detect link failure, a procedure known as handshaking is done. With this procedure first the host that is still functioning will continue to send I-am-up messages to the other host. After a certain period of time if the sending host does not receive a response from the receiving host it can assume the connection to the receiving host has failed. If the sender does not receive a response the next step can be one of two things. One, wait another designated period of time to receive a response, or two an Are-you-up message can be sent to the host in which the connection may down to. If this direct link from host A to B has failed the failure must be broadcasted to every host in the system. The broadcasting of this information is necessary so that the routing tables can be updated to skip over this line of communication.
Secondly, host failure can occur, this happens when the host becomes isolated from the system or there is a type of hardware failure that has taken place. In both scenarios of host failure communication to this host is lost. To determine if host failure has taken place, the handshaking procedure can be done. This procedure is done in the same way it is done when detecting link failure, a message is sent then a time period is waiting for a response. If the response has not been received one of two messages can be sent again and if a response is not received again it can assume the host is down. If the system believes a site has failure, and can no longer be reached by another site, then all sites must be notified. Sites must be notified so that they no longer attempt to use the services of the failed site. If the site served as a “central coordinator” and was the main site for a specific service, a new site must be elected to take over. However, if the site has only isolated itself and is still up and running but not communicating, then a situation in which there are now two “central coordinators” for one service. This type of situation can create conflicting actions since they will be trying to complete some of the same processes. Another type of failure that can occur is a storage medium failure. This type of failure happens when a device where data can be placed, kept, and retrieved from goes down. If storage medium failure occurs there is a loss of productivity in the system. This is due to the fact that some data is no longer able to be accessed in order for the processes needing the information to be completed. Storage medium failure can be solved with a RAID (redundant array of independent disks) setup. RAID will allow for a redundant solution since all of the disks are independent from one another, if one of the disks does down there are still more where the information is still available. Storage medium failure is a common type of failure since it can be caused by a simple hardware failure. Hardware failure is a failure that can cause any type of system whether it be a simple home system, a centralized system, or a distributed system. Scalability is also a common issue that can occur in any type of system. Scalability is the capability of a system to adapt to an increasing service load placed on the system. When a distributed site is created it is set up to handle a certain amount of users, services, and locations. If the number of users using the system increases, which in turn means a larger number of processes to be completed, all of the memory available on the system may be used. If this occurs the system will lose productivity since it will slow response time. When the system is upgrading or the amount of information needing to be handled increases a way to make sure the system can handle it is to add resources. Although adding more resources to the system may help handle the greater flow of information, it could also create other possible issues and the cost of increasing the system can be high. There are some failures with distributed systems that can also occur with centralized systems. Two of the failures are host failure and storage media failure. These two failures can occur in both types of system setups because they are basic failures. Host failure can occur because hosts may lose communication between one another in any type of situation for numerous different reasons. Storage media failure is also common since it can be caused by a simple hardware failure. Storage media failure can create numerous issues for the system unless a system such as RAID is put in place to ensure information is still accessible if one of the storage devices happens to go down.
Failures can occur in any system that is used whether it be a distributed, centralized, or a different system. Common failures that can occur with a distributed system are link failure, host failure, storage media failure, and scalability. There are solutions to fix the failures that may take place once they are detected.

References:
Silberschatz, A., Baer Galvin, P., & Gange, G. (2012). Operating System Concepts (8th ed.). Retrieved from The University of Phoenix eBook Collection database.

Similar Documents

Premium Essay

Distributed System Failure

...A distributed system is a collection of processors that run a single system, but may act independently. The processors on a distributed system can be on a single computer or multiple computers and can be spread across a local or wide area network. With this type of systems, potential problems can arise. The following will address some of these problems. Network Failure One problem that may arise in a distributed system is a failure within the network. The processors on a distributed system must communicate with each other over a network and failure to do so could cause problems with the function needing to be carried out. In order to fix this problem, you would need to find out which end the problem is originating from. This can be done by checking the data sent by all the processors and seeing if the data is being sent correctly. This will help to determine whether or not the problem is in the sending of the data or the receiving of the data within the network. After isolating the source of the problem, it can be addressed appropriately. Timing Failure A timing failure can occur when processors on the network are not synchronized. When processors are not synchronized, then processes that require two or more processors might become delayed or fail all together. For instance, if a process the uses multiple processors is schedule to occur at noon and one of the processors’ clock is a couple minutes fast, that processor will start the process too early which could result in...

Words: 573 - Pages: 3

Premium Essay

Distributed System Failures

...Distributed System Failures Mark McCarley POS/355 Terrance Carlson June 23, 2014 A distributed system can be described as a collection of computer systems linked together via a network and fully equipped with distributed system software. The distributed system software allows the individuals computer systems to coordinate computing activities and share resources such as system hardware and software as well as data. To the end-user a distributed system should appear as a single system that allows seamless interaction and improves overall availability and performance. A distributed system appears in direct contrast to a system where end-users are fully aware that there are several systems and/or locations. In some cases, in a non-distributed system end-user may even be aware of storage replication and load balancing. According to the “Georgia State University” (2014) website there are four main goals of a distributed system: Connecting resources and users, distribution transparency, openness and scalability. Similar to the goals of a distributed system, there are also four main types of possible failures that can occur in a distributed system: Crash failures, hardware failures, omission failures and byzantine failures. Crash failures, also referred to as operating system failures, are most typically associated with a server fault in distributed systems. In their most basic form a crash failure or operating system failure is an interrupt operation and can halt...

Words: 273 - Pages: 2

Free Essay

Distributed System Failures

...Distributed System Failures There are four types of failures that may be encountered when using and operating within a distributed system. Hardware failures occur when a single component within the system fails. Network failures refer to the failure of links within the distributed system network. Application failure occur to the failure of applications that run within the system, and can occur when the application stops working or operates incorrectly. Failure of synchronization occurs when different points in the system do not synchronize correctly. Both hardware and application failures may also occur within a centralized system as well as distributed systems. In the event of an application failure, it is important to first be able to differentiate between operator error and software error in order to determine the point of failure. When a hardware error occurs, this can be due to a few simple causes. Hardware failures occur when a single component within the system fails. The most common types of hardware failures are of a link, a site, or the loss of a message. At one point hardware failures were a common occurrence, but with recent innovations in hardware design and manufacturing these failures tend to be few and far between. Instead, more failures that now occur tend to be network or drive related. Network failures refer to the failure of links within the distributed system network. Processors within a distributed system need to be able to communicate with...

Words: 726 - Pages: 3

Free Essay

Failures of a Distributed System

...Failures of a Distributed System POS/355 July 25, 2013 Failures of a Distributed System In the words of Adam Savage from Mythbusters, “failure is always an option”. This holds true when talking about a distributed system, which is a computer network like a Wide Area Network (WAN) or a Local Area Network (LAN). Distributed systems is defined as a software system in which components located on networked computers communicate and coordinate their actions by passing messages (Coulouris, Dollimore, Kindberg, & Blair, 2012). This allows the computers or even devices like smart phones and tablets, to share resources like printers, hard drives, and even internet access. A centralized system is a computer that is by itself, one that is not connected to a laptop. Think of a centralized computer as one of the spy computers in movies, like Mission Impossible. These systems can and will fail, while sharing some failures; a distributed system has more components that could fail, leading to them having more problems. There a many things that could fail on a distributed system, this paper will cover four of them, starting with hardware failure. Video cards, network access card, hard disk drives, solid-state drives, memory, and power supply units (PSU), these are all pieces of hardware that are in most of the computers sold today, and they can all die at a moment’s notice. Some of these items, if they failed would not affect the network or distributed system at all, like a video card...

Words: 1133 - Pages: 5

Premium Essay

Failures in Distributed and Centralized Systems

...Failures in Distributed and Centralized Systems Student Name POS/355 Instructor Name Date Failures in Distributed and Centralized Systems In today’s technology we have a vastly wide range of options when it comes to networking and linking computer systems. Organizations use a few different methods to linking their systems together. Large organizations, such as banks, power grids, and airport flight controller systems use what is called a distributed system. A distributed system must be reliable, available, safe, and secure. Since a distributed system is a widely available system that is essentially a collection of independent computers. With any large system, there are more components, more software, and more security risks that can jeopardize the system’s integrity. Many smaller organizations use what is called a centralized system, which can be anything from a personal computer to several terminals connected to a server. These systems can run into a few errors within their processes called failures. Distributed System According to our text, “A distributed system is a collection of processors that do not share memory or a clock. Instead, each processor has its own local memory. The processors communicate with one another through various communication networks, such as high-speed buses or telephone lines. In this chapter, we discuss the general structure of distributed systems and the networks that interconnect them.” (Silbershatz, A., Galvin, P. B., & Gagne, G...

Words: 1091 - Pages: 5

Premium Essay

Four Types of Distributed Computer System Failures

...Four Types of Distributed Computer System Failures University of Phoenix August 19, 2013 David Conway Four Types of Distributed Computer System Failures This paper will discuss four common types of distributed computer system failures which are Crash failures also known as operating system failures, Hardware Failures, Omission Failures and Byzantine Failures. Included in the discussion are failures which can also occur in a centralized computer system, and how to isolate and repair two types of failures. Crash failures are normally associated with a server fault in a typical distributed system. Inherently crash failures are interrupt operations of the server and can halt operation for a considerable time (Projects Helper, 2012).Operating system failures are the best examples for this scenario. Operating System or software failures come in many more varieties than hardware failures. Software bugs in distributed systems can be difficult to replicate and, consequently, repair and or debug. Corresponding fault tolerant systems are developed and employed with respect to these affects. An operating system or software failure can also occur in a centralized system such as a data base this is why it is highly recommended to back up a data base using stable mass storage media (Projects Helper, 2012). We have an extensive...

Words: 1180 - Pages: 5

Free Essay

Internet Protocol

...Definition Networked computer systems are rapidly growing in importance as the medium of choice for the storage and exchange of information. However, current systems afford little privacy to their users, and typically store any given data item in only one or a few fixed places, creating a central point of failure. Because of a continued desire among individuals to protect the privacy of their authorship or readership of various types of sensitive information, and the undesirability of central points of failure which can be attacked by opponents wishing to remove data from the system or simply overloaded by too much interest, systems offering greater security and reliability are needed. Freenet is being developed as a distributed information storage and retrieval system designed to address these concerns of privacy and availability. The system operates as a location-independent distributed file system across many individual computers that allow files to be inserted, stored, and requested anonymously. There are five main design goals: 1.Anonymity for both producers and consumers of information 2.Deniability for storers of information 3.Resistance to attempts by third parties to deny access to information 4.Efficient dynamic storage and routing of information 5.Decentralization of all network functions The system is designed to respond adaptively to usage patterns, transparently moving, replicating, and deleting files as necessary to provide efficient service without...

Words: 700 - Pages: 3

Free Essay

Dispensers of California

...field of Automation, Process Instrumentation. Professional Summary • Engineering graduate in Instrumentation working at Hydrogen Manufacturing Unit of World’s Largest Grassroot Refinery, Reliance Jamnagar as Maintenance Engineer, Instrumentation. • Versatile, accomplished engineering management professional with expertise managing maintenance operations in a wide range of Instrumentation Systems and Equipment • Applies continuous improvement principles to increase process and maintenance efficiency. Exhibits a strong and firm approach to sustaining and encouraging safe work environments Career Accomplishments Organization – Reliance Industries Limited, Refinery Division, Jamnagar • Working as a team member for implementation and monitoring of Process Safety Management elements in HMU. • Involved in routine maintenance activities, commissioning and development co-ordination of instruments and control systems in Hydrogen Manufacturing Unit (Oil & Gas Refinery). • Configuring and fault diagnosis of DCS, PLC,Machine condition monitoring systems , Analyzers and Control Loops. o DCS : Invensys Foxboro o PLC : Triconix, Siemens Simatic S7 300,400 and Allen Bradley.. o MCMS: Bently Nevada • Installation, calibration and loop checking of the field instruments ( Emerson, Rosemount, ABB, Endress & Hauser, Bently Nevada,...

Words: 941 - Pages: 4

Free Essay

Week3 Pos/355

... |June 9, 2014 | |[Week 4 Individual Assignment-Failures] | | | Types of Failure in Distributed System December 5, 2012 Types of Failure in Distributed System To design a reliable distributed system that can run on unreliable communication networks, it is utmost important to recognize the various types of failures that a system has to deal with during a failure state. Broadly speaking failures of a distributed system fall into two obvious categories: hardware and software failure. A distributed system may suffer any of such types of failures. Yet each of the failure has its own particular nature, reasons and corresponding remedial actions to restore smooth operation (Ray, 2009). Follow are few types of failure that may occur for a distributed system. Transaction failure: Transaction failure is a centralized system failure. The failures generally occur due to two types of errors. These errors are: application software errors and system errors. In case of any logical error in the application software that is used for accessing a...

Words: 731 - Pages: 3

Free Essay

Pos 355 Failures

...Failures POS/355 August 26, 2013 UOPX Failures Distributed systems emerged recently in the world of computers. A distributed system is an application of independent computers that appear to work as a coherent system to its users. The advantages of distributed systems consist of developing the ability to continually to open interactions with other components to accommodate a number of computers and users. Thus, stating that a stand-alone system is not as powerful as a distributed system that has the combined capabilities of distributed components. This type of system does have its complications and is difficult to maintain complex interactions continual between running components. Problems do arise because distributed systems are not without its failures. Four types of failures will characterize and the solutions to two of these failures will address on how to fix such problems. Before constructing a distributed system reliable one must consider fault tolerance, availability, reliability, scalability, performance, and security. Fault tolerance means that the system continues to operate in the event of internal or external system failure to prevent data loss or other issues. Availability needed to restore operations to resume procedure with components has failed to perform. For the system to run over a long period without any errors is need and known as reliability. To remain scalable means to operate correctly on a large scale. Performance and security remains needed...

Words: 953 - Pages: 4

Free Essay

Distributed Database

...business environment has an increasing need for distributed database and client/server applications as the desire for reliable, scalable and accessible information is steadily rising. Distributed database systems provide an improvement on communication and data processing due to its data distribution throughout different network sites. Not only is data access faster, but a single-point of failure is less likely to occur, and it provides local control of data for users. However, there is some complexity when attempting to manage and control distributed database systems. The DDBMS synchronizes all the data periodically, and in cases where multiple users must access the same data, ensures that updates and deletes performed on the data at one location will be automatically reflected in the data stored elsewhere. A distributed database can also be defined as a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system is then defined as the software system that permits the management of the distributed databases and makes this distribution transparent to the users. Distributed database system is to referred as a combination of the distributed databases and the distributed DBMS Current trends in multi-tier client/server networks make DDBS an appropriated solution to provide access to and control over localized databases. Oracle, as a leading Database Management System (DBMS) vendor employs the two-phase commit technique...

Words: 3658 - Pages: 15

Premium Essay

Distributed Systems and Centralized Systems

...Distributed System and Centralized Failures By Kentrell Lanier POS/355 March 28, 2014 Paul Borkowski Distributed System and Centralized System Failures Distributed system is many computers linked together that take on different tasks and act like one big computer. Distributed system is found in business across the world. When computers are linked together they share the same database and server. Distributed system is constructed for resource sharing, computation speedup, reliability, and communication Distributed system have different names for the computers in the system. Names such as sites, nodes, computers, machines, and host. Each names goes to a computer that’s part of the system. Resource sharing is when computers link up and they have different data any user can use the data form any computer in the system. Computation speedup is when the system recognize that one computer is over worked so the system have computers that’s have less duties to perform the tasks. Computation speedup help the system from crashing and tasks are preformed quicker. Distributed systems are more reliable because if one computer crash or fail the others can share its responsibilities and system will continue running smoothly. By computers being link together the users can communicate between each other. Two Types of failure When dealing with computers there are two types of failures. You can have a hard drive failure or a software failure. A hard drive failure is when the disk drive fails to...

Words: 874 - Pages: 4

Free Essay

Poss 355

...FAILURES POSS / 355 Moore Clarence 29 june 2015 BOB O CONNER To begin what is a distributed system? There are several words that can describe parts that make up a distributed system. A program , a process, a message, packet, protocol, network components all take part in helping define what a distributed system makes of. A distributed system is an application that executes a collection of protocols to coordinate cooperate together to perform a single or small set of related tasks. Failure is the defining difference between distributed and local programming. So you have to design distributed system with the expectation of failures. Handling failures is an important theme in distributed systems design. Failures fall into two obvious categories. Hardware and software. Hardware failures was once an issue but since has improved a lot. Dealing with a lot of improvements to such items as wiring and circuits played positive roles to improving hardware the mechanical and network failures are part of todays problems. Software failures is part of a distributed system. When a software failure occurs it often affect downtime to the distributed system. The computer freezing or fail stop and so often even a network failure. Types of failures includes crash failures that is when a server halts, but its working correctly until it halts. Omission failure is another type of failure that a server fails to respond to incoming requests also fails to receive incoming messages or fails to...

Words: 346 - Pages: 2

Premium Essay

Failures Paper

...Failures Paper Charles Persinger University of Phoenix POS/355 Jeff Rugg April 28, 2014 Simply put, distributed computing is allowing computers to work together in groups to solve a single problem too large for any one of them to perform on its own. Distributed computing is not a simple matter of just sticking the computers together. For a distributed computation to work effectively, those systems must cooperate, and must do so without lots of manual intervention by people. This is usually done by splitting problems into smaller pieces, each of which can be tackled more simply than the whole problem. The results of doing each piece are then reassembled into the full solution. As handy as a distributed system can be there are a there are four main issues you could face: Operating system failures, Hardware Failures, Omission Failures and Byzantine Failures. Crash failures are caused across the server of a typical distributed system and if these failures are occurred operations of the server are halt for some time. Operating system failures are the best examples for this case and the corresponding fault tolerant systems are developed with respect to these affects. Hardware failures used to be more common, but with all of the recent innovations in hardware design and manufacturing they tend to be fewer and far between with most of these physical failures tending to be network or drive related. With more hardware the probability goes up that there will...

Words: 747 - Pages: 3

Premium Essay

Son of Computer and Technology

...Crash failures: Crash failures are caused across the server of a typical distributed system and if these failures are occurred operations of the server are halt for some time. Operating system failures are the best examples for this case and the corresponding fault tolerant systems are developed with respect to these affects. Timing failures: Timing failures are caused across the server of a distributed system. The usual behavior of these timing failures would be like that the server response time towards the client requests would be more than the expected range. Control flow out of the responses may be caused due to these timing failures and the corresponding clients may give up as they can’t wait for the required response from the server and thus the server operations are failed due to this. Omission failures: Omission failures are caused across the server due to lack or reply or response from the server across the distributed systems. There are different issues raised due to these omission failures and the key among them are server not listening or a typical buffer overflow errors across the servers of the distributed systems. Byzantine failures: Byzantine failures are also know as arbitrary failures and these failures are caused across the server of the distributed systems. These failures cause the server to behave arbitrary in nature and the server responds in an arbitrary passion at arbitrary times across the distributed systems. Output from the server would...

Words: 284 - Pages: 2