...Distributed Shared Memory Systems by Wilson Cheng-Yi Hsieh S.B., Massachusetts Institute of Technology (1988) S.M., Massachusetts Institute of Technology (1988) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 1995 c Massachusetts Institute of Technology 1995. All rights reserved. Author : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Department of Electrical Engineering and Computer Science September 5, 1995 Certified by : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : M. Frans Kaashoek Assistant Professor of Computer Science and Engineering Thesis Supervisor Certified by : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : William E. Weihl Associate Professor of Computer Science and Engineering Thesis Supervisor Accepted by : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Frederic R. Morgenthaler Chairman, Departmental Committee on Graduate Students 1 2 Dynamic Computation Migration in Distributed Shared Memory Systems ...
Words: 40765 - Pages: 164
...history of modern computing into the following eras: 1970s: Timesharing (1 computer with many users) 1980s: Personal computing (1 computer per user) 1990s: Parallel computing (many computers per user) Until about 1980, computers were huge, expensive, and located in computer centers. Most organizations had a single large machine. In the 1980s, prices came down to the point where each user could have his or her own personal computer or workstation. These machines were often networked together, so that users could do remote logins on other people’s computers or share files in various (often ad hoc) ways. Nowadays some systems have many processors per user, either in the form of a parallel computer or a large collection of CPUs shared by a small user community. Such systems are usually called parallel or distributed computer systems. This development raises the question of what kind of software will be needed for these new systems. To answer this question, a group under the direction of Prof. Andrew S. Tanenbaum at the Vrije Universiteit (VU) in Amsterdam (The Netherlands) has been doing research since 1980 in the area of distributed computer systems. This research, partly done in cooperation with the Centrum voor Wiskunde en Informatica (CWI), has resulted in the development of a new distributed operating system, called Amoeba, designed for an environment consisting of a large number of computers. Amoeba is available for free to universities and other educational...
Words: 4509 - Pages: 19
...TITLE: DESIGN ISSUES AND FUTURE TRENDS OF DISTRIBUTED SHARED MEMORY SYSTEMS ABSTRACT In these times, the distributed shared memory paradigm has gained a lot of attention in the field of distributed systems. This piece of work looks into different system issues that arise in the design of distributive shared memory systems. The work has been motivated by the observation that distributed systems will continue to become popular and will be largely be used to solve large computational issues. Since shared memory paradigm offers a natural transition for a programmer from the field of uniprocessors, it is very attractive for programming large distributed systems. Introduction The motive of this research is to identify a set of system issues, such as integration of DSM with virtual memory management, choice of memory model, choice of coherence protocol, and technology factors; and evaluate the effects of the design alternatives on the performance of DSM systems. The design alternatives have been evaluated in three steps. First, we do a detailed performance study of a distributed shared memory implementation on the CLOUDS distributed operating system. Second, we implement and analyze the performance of several applications on a distributed shared memory system. Third, the system issues that could not be evaluated via the experimental study are evaluated using a simulation-based approach. The simulation model is developed from our experience with the CLOUDS distributed system....
Words: 1092 - Pages: 5
...limited speed and reliability because of the many moving parts. Modern machines use electronics for most information transmission. Computing is normally thought of as being divided into generations. Each successive generation is marked by sharp changes in hardware and software technologies. With some exceptions, most of the advances introduced in one generation are carried through to later generations. We are currently in the fifth generation. Technology and Architecture Vacuum tubes and relay memories CPU driven by a program counter (PC) and accumulator Machines had only fixed-point arithmetic Software and Applications Machine and assembly language Single user at a time No subroutine linkage mechanisms Programmed I/O required continuous use of CPU Representative IAS, IBM 701 systems: ENIAC, Princeton Technology and Architecture Discrete transistors and core memories I/O processors, multiplexed memory access Floating-point arithmetic available Register Transfer Language (RTL) developed Software and Applications High-level languages...
Words: 2199 - Pages: 9
...What is AMD’s Heterogeneous System Architecture? By Ralph Efftien Polytechnia Institute CGS-1280C Computer Hardware Berkeley Open Infrastructure for Network Computing (BOINC) evolved from the SetiAtHome screensaver program created by Dr. David Anderson at University of California – Berkeley. BOINC over the major revisions has gone from running scientific applications during a CPUs idle time to also running the apps on the GPU (BOINC, 2013). They first started with NVIDIA’s CUDA routines and added ATI’s CAL routines to enable projects applications to perform scientific calculations on their various GPUs. With the acquisition of ATI by AMD, AMD has since dropped support for the CAL routines and has gone with supporting OpenCL for running general applications of the graphics coprocessor. The following list are some of the projects that use the GPU: 1. http://einstein.phys.uwm.edu/ 2. http://boinc.thesonntags.com/collatz/ 3. http://milkyway.cs.rpi.edu/milkyway/ 4. http://boinc.fzk.de/poem/ 5. http://www.primegrid.com/ 6. http://setiathome.berkeley.edu/ 7. http://setiweb.ssl.berkeley.edu/beta/ 8. http://moowrap.net/ These projects are the first that use the GPU to perform calculations on work units. They all started with applications that BOINC ran CPUs and the programmers at the various projects then adapted to run on the GPU. BOINC being a multi-threaded program is able to run one project on each core of a systems CPU and anywhere...
Words: 1036 - Pages: 5
...White Paper. Thanks to the customer and IBM team for their contribution and support to this project. Trademarks The following terms are registered trademarks of International Business Machines Corporation in the United States and/or other countries: AIX, AS/400, DB2, IBM, Micro Channel, MQSeries, Netfinity, NUMAQ, OS/390, OS/400, Parallel Sysplex, PartnerLink, POWERparallel, RS/6000, S/390, Scalable POWERparallel Systems, Sequent, SP2, System/390, ThinkPad, WebSphere. The following terms are trademarks of International Business Machines Corporation in the United States and/or other countries: DB2 Universal Database, DEEP BLUE, e-business (logo), GigaProcessor, HACMP/6000, Intelligent Miner, iSeries, Network Station, NUMACenter, POWER2 Architecture, PowerPC 604,pSeries, Sequent (logo), SmoothStart, SP, xSeries, zSeries. A full list of U.S. trademarks owned by IBM may be found at http://iplswww.nas.ibm.com/wpts/trademarks/trademar.htm. NetView, Tivoli and TME are registered trademarks and TME Enterprise is a...
Words: 6610 - Pages: 27
...Answer 1.13: a. Mainframe or minicomputer systems Memory Resources: Main Memory (RAM) is an important part of the mainframe systems that must be carefully managed, as it is shared among a large number of users. CPU Resources: Again, due to being shared amongst a lot of users it is important to manage CPU resources in mainframes and minicomputer systems. Storage: Storage is an important resource that requires to be managed due to being shared among multiple users. Network Bandwidth: Sharing of data is a major activity in systems shared by multiple users. It is important to manage network bandwidth in such systems. b. Workstations connected to servers Memory Resources: When Workstations are connected to servers, multiple applications run on multiple Workstations...
Words: 1265 - Pages: 6
...KVM architecture hosts the virtual machine images as regular Linux processes, so that each virtual machine image can use all of the features of the Linux kernel, including hardware, security, storage, and applications. You can use any type of storage that is supported by Linux to store virtual machine images, including local disks, SCSI, or network-attached storage such as NFS and SAN. The KVM hypervisor also supports virtual machine images on shared file systems such as the Global File System (GFS2) allowing the images to be shared by multiple hosts. With the KVM hypervisor, you can perform live migrations and move a running virtual machine between physical hosts with no interruption to service. You can save the current state of a virtual machine to disk so that you can restart running the virtual machine from its previous state at a later time. Because the KVM architecture hosts the virtual machine images as regular Linux processes, you can use the standard Linux security measures to isolate the images and provide resource controls. The Linux kernel includes SELinux along with sVIRT to isolate virtual images. In addition, you can use control groups (cgroups) to further restrict a set of tasks to a set of resources and monitor resource use. For more information about securing your KVM environment, see KVM security. The KVM architecture supports the memory management features of Linux. In addition, with Kernel Same-page Merging (KSM) virtual images can share memory pages...
Words: 366 - Pages: 2
...A Look into Computer Memory Table of Contents Abstract 3 A Look into Computer Memory 4 Memory Hierarchy 4 Allocation Policies 5 Relocation Policies 6 Hit and Hit Ratio 6 Modern Computer Applications 7 Conclusion 7 References 8 Abstract The memory of a computer is a key component of the overall architecture of a computer. Several types of memory exist with the architecture of the computer which collectively is known as the memory hierarchy. The use of the memory hierarchy, placing and moving information, is effected by the allocation and relocation policies. How well these policies allow the processor to find the information it is looking for, known as a hit, is determined by the hit ratio. The modern processor available today relies on memory hierarchy to maintain their high performance. The paper will take a look at how these various pieces and policies work together within the architecture of a computer. A Look into Computer Memory Memory plays a key role in the modern processor. The memory hierarchy is the foundation for which the allocation and relocation policies function upon. These policies work to provide the needed information in the proper memory location to attempt to maintain a high hit ratio to avoid processor delay. Regardless of the speed of a modern processor, a low hit ratio adds delay to the high performance processor. Memory Hierarchy Memory in a computer varies in size, speed with regards to access time and, just as importantly,...
Words: 1554 - Pages: 7
...Neumann architecture, it is necessary to understand the previous architecture of the ENIAC computer. The computer hardware consists of various vacuum tubes that are arranged in such a way to process data. Special computers are all the instructions are known in advance and what need to be done is just fetch in data, process them, and produce results. John von Neumann invented a nice architecture for general purpose computer which is still used. These architecture rules are not used in the hardware but treated in the same way as data. Binary codes are fetched into the CPU. Codes have the same length as data ones and contain information about what operations to run and specific addresses to read or write. The two essential parts of the von Neumann architecture are the Arithmetic Logic Unit (CA) and the Program Control Unit (CC) that are combined to form the Central Processing Unit (CPU). The main function of a computer, as the name implied, is to compute or specifically perform certain arithmetic (add, subtract, multiple, divide) and logic (comparisons) operations; this is carried out by the central arithmetical (CA) part of the computer. The CA is however unable to make computations by itself, there is, instead a special part that tell CA what kind of operations to perform, what sequence of instructions to be carried out, where to look for the parameters (data), and where to store the results of operations. This part actually controls the program stored in the memory and...
Words: 604 - Pages: 3
...1: Computer Architecture Tony D. Everett Professor Jennifer Merritt CIS106 Intro to Information Technology July 28, 2013 Introduction The use of Information Technology is well recognized. Information Technology has become a must for survival of all business housing with the growing IT trends. The computer is the main component of any Information Technology system. Today, computer technology has filled every scope of existence of modern man. From airline reservations to auto repair diagnosis, from government services databases to manufacturing and production systems that are used by the likes of Pepsi, Kellogg’s and Kraft Foods-everywhere we witness the elegance, complexity and effectiveness possible only with the help of computers. These systems operate using the Von Neumann Architecture. The Von Neumann Architecture and Importance The von Neumann architecture is a design model for a stored-program digital computer that uses a processing unit and a single separate storage structure to hold both instructions and data. The instructions are executed sequentially which is a slow process. One shared memory for instructions and data with one data bus and one address bus between processor and memory. Commands and data have to be fetched in sequential order (known as the Von Neumann Bottleneck), limiting the operation bandwidth. Its design is simpler than that of the Harvard architecture. It is mostly used to interface to external memory. Neumann architecture computers are...
Words: 1237 - Pages: 5
...features require changes to the OS! 2 February, 2008 Impact of OSs on Modern CPU Designs Degrees of Freedom for CPU designs Address and instruction width Memory bus connection Instruction set Pipeline stages Number of execution units Number of cores Number of CPUs CPU Interconnects Caches ... 3 February, 2008 Impact of OSs on Modern CPU Designs AMD's HW/SW Co-Design Approach Next-Gen CPU/GPU HW-Architecture Architecture Improvements CPU Behavioural Description Binary Code Cycle Accurate Simulator Code Improvements Operating System Prototype Full In-House Design Cycle 4 OS Reference Implementation February, 2008 Impact of OSs on Modern CPU Designs Uniform vs. Non-Uniform Memory I/O I/O Hub Hub Memory Controller PCI-E Hub Bridge PCI-E Bridge PCI-E PCI-E Bridge Bridge PCI-E Bridge PCI-E Bridge I/O Hub USB PCI Traditional x86 architecture • Frontside bus limits memory bandwidth to a fixed maximum 5 Direct Connect Architecture • Memory bandwidth scales with number of processors February, 2008 Impact of OSs on Modern CPU Designs Example: Advanced Synchronization Facility 6 February, 2008 Impact of OSs on Modern CPU Designs Advanced Synchronization Facility Proposed facility for low-overhead atomic memory modification Change a set of cache lines, mass-commit atomically Primitive for higher-level synchronization primitives Roll your own DCAS / LL-SC Highly flexible Use almost...
Words: 816 - Pages: 4
...TARCAD: A Template Architecture for Reconfigurable Accelerator Designs Muhammad Shafiq, Miquel Peric` s a Nacho Navarro Eduard Ayguad´ e Computer Sciences Dept. Arquitectura de Computadors Computer Sciences Barcelona Supercomputing Center Universitat Polit` cnica de Catalunya Barcelona Supercomputing Center e Barcelona, Spain Barcelona, Spain Barcelona, Spain {muhammad.shafiq, miquel.pericas}@bsc.es nacho@ac.upc.edu eduard.ayguade@bsc.es Abstract—In the race towards computational efficiency, accelerators are achieving prominence. Among the different types, accelerators built using reconfigurable fabric, such as FPGAs, have a tremendous potential due to the ability to customize the hardware to the application. However, the lack of a standard design methodology hinders the adoption of such devices and makes difficult the portability and reusability across designs. In addition, generation of highly customized circuits does not integrate nicely with high level synthesis tools. In this work, we introduce TARCAD, a template architecture to design reconfigurable accelerators. TARCAD enables high customization in the data management and compute engines while retaining a programming model based on generic programming principles. The template features generality and scalable performance over a range of FPGAs. We describe the template architecture in detail and show how to implement five important scientific kernels: MxM, Acoustic Wave Equation, FFT, SpMV and Smith Waterman. TARCAD is compared...
Words: 7421 - Pages: 30
...[Type the company name] | REPORT | SWEL: Hardware Cache Coherence Protocols to Map Shared Data onto Shared Caches | | Asadullah | 12/15/2013 | … | Contents Abstract 3 Introduction 3 Proposed Solution (SWEL) 5 Optimizations of SWEL 6 Dynamically Tuned RSWEL 7 Implementation 7 Experiment and Results 7 Conclusion 10 References 10 Abstract Shared Memory Multi processors require cache coherence in order to keep cached values updated while performing operations. Snooping and directory based protocols are two well known standards of cache coherence. However both of them possess some problems. Snooping protocol is not scalable and is only suitable for systems of 2 to 8 SMP’s. Whereas directory based protocol gives rise to memory overhead when there are too many sharers of a particular block. We propose a novel protocol for cache coherence that exploits the private block of memory. Coherence protocol invoked only for shared data blocks. This reduces network and storage overhead and it does not compromise with scalability as well. Introduction Shared Memory Multi Processor has multiple processors with their caches and a global memory. Memory is connected with processors and a global address space is maintained. When a block of data is cached by one processor, it said to private. The block of data is called shared if more than one processor cache the same block of data. In later case it necessary that read operation of any processor should return...
Words: 1891 - Pages: 8
...CSC 213 ARCHITECTURE ASSIGNMENT QUESTION 1 2.1. What is a stored program computer? A stored program computer is a computer to use a stored-program concept. A stored-program concept is the programming process could be facilitated if the program could be represented in a form suitable for storing in memory alongside the data. Then, a computer could get its instructions by reading them from memory, and a program could be set or altered by setting the values of a portion of memory. 2.2. The four main components of any general-purpose computer * Main memory (M) * I/O module (I, O) * Arithmetic-logic unit (CA) * Program control unit (CC) 2.3. The three principal constituents of a computer system at an intergreted circuit level * Transistors * Resistors * Capacitors 2.4. Explain Moore’s law The famous Moore’s law, which was propounded by Gordon Moore, cofounder of Intel, in 1965. Moore observed that the number of transistors that could be put on a single chip was doubling every year and correctly predicted that this pace would continue into the near future. To the surprise of many, including Moore, the pace continued year after year and decade after decade. The pace slowed to a doubling every 18 months in the 1970s, but has sustained that rate ever since. 2.5. The key characteristics of a computer family * Similar or identical instruction set: In many cases, the exact...
Words: 1531 - Pages: 7