...TARCAD: A Template Architecture for Reconfigurable Accelerator Designs Muhammad Shafiq, Miquel Peric` s a Nacho Navarro Eduard Ayguad´ e Computer Sciences Dept. Arquitectura de Computadors Computer Sciences Barcelona Supercomputing Center Universitat Polit` cnica de Catalunya Barcelona Supercomputing Center e Barcelona, Spain Barcelona, Spain Barcelona, Spain {muhammad.shafiq, miquel.pericas}@bsc.es nacho@ac.upc.edu eduard.ayguade@bsc.es Abstract—In the race towards computational efficiency, accelerators are achieving prominence. Among the different types, accelerators built using reconfigurable fabric, such as FPGAs, have a tremendous potential due to the ability to customize the hardware to the application. However, the lack of a standard design methodology hinders the adoption of such devices and makes difficult the portability and reusability across designs. In addition, generation of highly customized circuits does not integrate nicely with high level synthesis tools. In this work, we introduce TARCAD, a template architecture to design reconfigurable accelerators. TARCAD enables high customization in the data management and compute engines while retaining a programming model based on generic programming principles. The template features generality and scalable performance over a range of FPGAs. We describe the template architecture in detail and show how to implement five important scientific kernels: MxM, Acoustic Wave Equation, FFT, SpMV and Smith Waterman. TARCAD is compared...
Words: 7421 - Pages: 30
...1 Civilization advances by extending the number of important operations which we can perform without thinking about them. Alfred North Whitehead, An Introduction to Mathematics, 1911 Computer Abstractions and Technology 1.1 Introduction 3 1.2 Eight Great Ideas in Computer Architecture 11 1.3 Below Your Program 13 1.4 Under the Covers 16 1.5 Technologies for Building Processors and Memory 24 1.6 Performance 28 1.7 The Power Wall 40 1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43 1.9 Real Stuff: Benchmarking the Intel Core i7 46 1.10 Fallacies and Pitfalls 49 1.11 Concluding Remarks 52 1.12 Historical Perspective and Further Reading 54 1.13 Exercises 54 1.1 Introduction Welcome to this book! We’re delighted to have this opportunity to convey the excitement of the world of computer systems. This is not a dry and dreary field, where progress is glacial and where new ideas atrophy from neglect. No! Computers are the product of the incredibly vibrant information technology industry, all aspects of which are responsible for almost 10% of the gross national product of the United States, and whose economy has become dependent in part on the rapid improvements in information technology promised by Moore’s Law. This unusual industry embraces innovation at a breath-taking rate. In the last 30 years, there have been a number of new computers whose introduction...
Words: 24107 - Pages: 97
...Filesystem for Parallel Applications John Bent∗† Garth Gibson‡ Gary Grider∗ Ben McClelland∗ , , , , Paul Nowoczynski§ James Nunez∗ Milo Polte† Meghan Wingate∗ , , , ABSTRACT Categories and Subject Descriptors D.4.3 [Operating Systems]: File Systems ManagementFile organization General Terms Performance, Design Keywords High performance computing, parallel computing, checkpointing, parallel file systems and IO ∗ LANL Technical Information Release: 09-02117 Los Alamos National Laboratory ‡ Carnegie Mellon University § Pittsburgh Supercomputing Center † (c) 2009 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by a contractor or affiliate of the U.S. Government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. SC09 November 14–20, Portland, Oregon, USA. Copyright 2009 ACM 978-1-60558-744-8/09/11 ...$10.00. 100 Speedup (X) Parallel applications running across thousands of processors must protect themselves from inevitable system failures. Many applications insulate themselves from failures by checkpointing. For many applications, checkpointing into a shared single file is most convenient. With such an approach, the size of writes are often small and not aligned with file system boundaries. Unfortunately for these applications, this preferred data layout results in...
Words: 12373 - Pages: 50
...COMPUTER ORGANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE EIGHTH EDITION William Stallings Prentice Hall Upper Saddle River, NJ 07458 Library of Congress Cataloging-in-Publication Data On File Vice President and Editorial Director: Marcia J. Horton Editor-in-Chief: Michael Hirsch Executive Editor: Tracy Dunkelberger Associate Editor: Melinda Haggerty Marketing Manager: Erin Davis Senior Managing Editor: Scott Disanno Production Editor: Rose Kernan Operations Specialist: Lisa McDowell Art Director: Kenny Beck Cover Design: Kristine Carney Director, Image Resource Center: Melinda Patelli Manager, Rights and Permissions: Zina Arabia Manager, Visual Research: Beth Brenzel Manager, Cover Visual Research & Permissions: Karen Sanatar Composition: Rakesh Poddar, Aptara®, Inc. Cover Image: Picturegarden /Image Bank /Getty Images, Inc. Copyright © 2010, 2006 by Pearson Education, Inc., Upper Saddle River, New Jersey, 07458. Pearson Prentice Hall. All rights reserved. Printed in the United States of America. This publication is protected by Copyright and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permission(s), write to: Rights and Permissions Department. Pearson Prentice Hall™ is a trademark of Pearson Education, Inc. Pearson® is a registered trademark of...
Words: 239771 - Pages: 960
...An Analysis of Linux Scalability to Many Cores Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich MIT CSAIL A BSTRACT This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard parallel programming techniques— this paper introduces one new technique, sloppy counters—these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this analysis is that there is no scalability reason to give up on traditional operating system organizations just yet. but the other applications scale poorly, performing much less work per core with 48 cores than with one core. We attempt to understand and fix the scalability problems, by modifying either the applications or the Linux kernel. We then iterate, since fixing one scalability problem usually exposes further ones. The end result for each application is either good scalability on 48 cores, or attribution of non-scalability to a hard-to-fix problem with the application, the Linux kernel, or the underlying hardware. The analysis of whether the kernel design is compatible with scaling rests on the extent to which...
Words: 12751 - Pages: 52
...School of Computer and Communications Science at EPFL Jacob Leverich Hewlett-Packard Kevin Lim Hewlett-Packard John Nickolls NVIDIA John Oliver Cal Poly, San Luis Obispo Milos Prvulovic Georgia Tech Partha Ranganathan Hewlett-Packard Table of Contents Cover image Title page In Praise of Computer Organization and Design: The Hardware/Software Interface, Fifth Edition Front-matter Copyright Dedication Acknowledgments Preface About This Book About the Other Book Changes for the Fifth Edition Changes for the Fifth Edition Concluding Remarks Acknowledgments for the Fifth Edition 1. Computer Abstractions and Technology 1.1 Introduction 1.2 Eight Great Ideas in Computer Architecture 1.3 Below Your Program 1.4 Under the Covers 1.5 Technologies for Building Processors and Memory 1.6 Performance 1.7 The Power Wall 1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 1.9 Real Stuff: Benchmarking the Intel Core i7 1.10 Fallacies and Pitfalls 1.11 Concluding Remarks 1.12 Historical Perspective and Further Reading 1.13 Exercises 2. Instructions: Language of the Computer 2.1 Introduction 2.2 Operations of the Computer Hardware 2.3 Operands of the Computer Hardware 2.4 Signed and Unsigned Numbers 2.5 Representing Instructions in the Computer 2.6 Logical Operations 2.7...
Words: 79060 - Pages: 317
...4. 4.1 Big Data Introduction In 2004, Wal-Mart claimed to have the largest data warehouse with 500 terabytes storage (equivalent to 50 printed collections of the US Library of Congress). In 2009, eBay storage amounted to eight petabytes (think of 104 years of HD-TV video). Two years later, the Yahoo warehouse totalled 170 petabytes1 (8.5 times of all hard disk drives created in 1995)2. Since the rise of digitisation, enterprises from various verticals have amassed burgeoning amounts of digital data, capturing trillions of bytes of information about their customers, suppliers and operations. Data volume is also growing exponentially due to the explosion of machine-generated data (data records, web-log files, sensor data) and from growing human engagement within the social networks. The growth of data will never stop. According to the 2011 IDC Digital Universe Study, 130 exabytes of data were created and stored in 2005. The amount grew to 1,227 exabytes in 2010 and is projected to grow at 45.2% to 7,910 exabytes in 2015.3 The growth of data constitutes the “Big Data” phenomenon – a technological phenomenon brought about by the rapid rate of data growth and parallel advancements in technology that have given rise to an ecosystem of software and hardware products that are enabling users to analyse this data to produce new and more granular levels of insight. Figure 1: A decade of Digital Universe Growth: Storage in Exabytes Error! Reference source not found.3 1 ...
Words: 22222 - Pages: 89
...congestion manifests itself differently in a NoC than in traditional networks. Network congestion reduces system throughput in congested workloads for smaller NoCs (16 and 64 nodes), and limits the scalability of larger bufferless NoCs (256 to 4096 nodes) even when traffic has locality (e.g., when an application’s required data is mapped nearby to its core in the network). We propose a new source throttlingbased congestion control mechanism with application-level awareness that reduces network congestion to improve system performance. Our mechanism improves system performance by up to 28% (15% on average in congested workloads) in smaller NoCs, achieves linear throughput scaling in NoCs up to 4096 cores (attaining similar performance scalability to a NoC with large buffers), and reduces power consumption by up to 20%. Thus, we show an effective application of a network-level concept, congestion control, to a class of networks – bufferless on-chip networks – that has not been studied before by the networking community. Categories and Subject Descriptors C.1.2 [Computer Systems Organization]: Multiprocessors – Interconnection architectures; C.2.1 [Network Architecture and Design]: Packet-switching networks Keywords On-chip networks, multi-core, congestion control 1. INTRODUCTION One of the most...
Words: 13410 - Pages: 54
... the essentials of Linda Null Pennsylvania State University Julia Lobur Pennsylvania State University World Headquarters Jones and Bartlett Publishers 40 Tall Pine Drive Sudbury, MA 01776 978-443-5000 info@jbpub.com www.jbpub.com Jones and Bartlett Publishers Canada 2406 Nikanna Road Mississauga, ON L5C 2W6 CANADA Jones and Bartlett Publishers International Barb House, Barb Mews London W6 7PA UK Copyright © 2003 by Jones and Bartlett Publishers, Inc. Cover image © David Buffington / Getty Images Illustrations based upon and drawn from art provided by Julia Lobur Library of Congress Cataloging-in-Publication Data Null, Linda. The essentials of computer organization and architecture / Linda Null, Julia Lobur. p. cm. ISBN 0-7637-0444-X 1. Computer organization. 2. Computer architecture. I. Lobur, Julia. II. Title. QA76.9.C643 N85 2003 004.2’2—dc21 2002040576 All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without written permission from the copyright owner. Chief Executive Officer: Clayton Jones Chief Operating Officer: Don W. Jones, Jr. Executive V.P. and Publisher: Robert W. Holland, Jr. V.P., Design and Production: Anne Spencer V.P., Manufacturing and Inventory Control: Therese Bräuer Director, Sales and Marketing: William Kane Editor-in-Chief...
Words: 118595 - Pages: 475
...UNIVERSITY OF KERALA B. TECH. DEGREE COURSE 2008 ADMISSION REGULATIONS and I VIII SEMESTERS SCHEME AND SYLLABUS of COMPUTER SCIENCE AND ENGINEERING B.Tech Comp. Sc. & Engg., University of Kerala 2 UNIVERSITY OF KERALA B.Tech Degree Course – 2008 Scheme REGULATIONS 1. Conditions for Admission Candidates for admission to the B.Tech degree course shall be required to have passed the Higher Secondary Examination, Kerala or 12th Standard V.H.S.E., C.B.S.E., I.S.C. or any examination accepted by the university as equivalent thereto obtaining not less than 50% in Mathematics and 50% in Mathematics, Physics and Chemistry/ Bio- technology/ Computer Science/ Biology put together, or a diploma in Engineering awarded by the Board of Technical Education, Kerala or an examination recognized as equivalent thereto after undergoing an institutional course of at least three years securing a minimum of 50 % marks in the final diploma examination subject to the usual concessions allowed for backward classes and other communities as specified from time to time. 2. Duration of the course i) The course for the B.Tech Degree shall extend over a period of four academic years comprising of eight semesters. The first and second semester shall be combined and each semester from third semester onwards shall cover the groups of subjects as given in the curriculum and scheme of examination ii) Each semester shall ordinarily comprise of not less than 400 working periods each of 60 minutes duration...
Words: 34195 - Pages: 137
...Oracle® Database Concepts 10g Release 2 (10.2) B14220-02 October 2005 Oracle Database Concepts, 10g Release 2 (10.2) B14220-02 Copyright © 1993, 2005, Oracle. All rights reserved. Primary Author: Michele Cyran Contributing Author: Paul Lane, JP Polk Contributor: Omar Alonso, Penny Avril, Hermann Baer, Sandeepan Banerjee, Mark Bauer, Bill Bridge, Sandra Cheevers, Carol Colrain, Vira Goorah, Mike Hartstein, John Haydu, Wei Hu, Ramkumar Krishnan, Vasudha Krishnaswamy, Bill Lee, Bryn Llewellyn, Rich Long, Diana Lorentz, Paul Manning, Valarie Moore, Mughees Minhas, Gopal Mulagund, Muthu Olagappan, Jennifer Polk, Kathy Rich, John Russell, Viv Schupmann, Bob Thome, Randy Urbano, Michael Verheij, Ron Weiss, Steve Wertheimer The Programs (which include both the software and documentation) contain proprietary information; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent, and other intellectual and industrial property laws. Reverse engineering, disassembly, or decompilation of the Programs, except to the extent required to obtain interoperability with other independently created software or as specified by law, is prohibited. The information contained in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. This document is not warranted to be error-free. Except as may be expressly permitted in your license agreement...
Words: 199783 - Pages: 800
...(IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 3, March 2012 An Efficient Automatic Attendance System Using Fingerprint Reconstruction Technique Josphineleela.R Dr.M.Ramakrishnan Research scholar Department of Computer Science and Engineering Sathyabamauniversity Chennai,India ilanleela@yahoo.com Professor/HOD-IT Velammal Engineering College Chennai,India ramkrishod@gmail.com Abstract— Biometric time and attendance system is one of the most successful applications of biometric technology. One of the main advantage of a biometric time and attendance system is it avoids "buddy-punching". Buddy punching was a major loophole which will be exploiting in the traditional time attendance systems. Fingerprint recognition is an established field today, but still identifying individual from a set of enrolled fingerprints is a time taking process. Most fingerprint-based biometric systems store the minutiae template of a user in the database. It has been traditionally assumed that the minutiae template of a user does not reveal any information about the original fingerprint. This belief has now been shown to be false; several algorithms have been proposed that can reconstruct fingerprint images from minutiae templates. In this paper, a novel fingerprint reconstruction algorithm is proposed to reconstruct the phase image, which is then converted into the grayscale image. The proposed reconstruction algorithm ...
Words: 3558 - Pages: 15
...optimization of a 2.4 Kbps mixed excitation linear prediction (MELP) speech-coding algorithm on a Texas instrument TMS320C64xx digital signal processor. The main emphases of the project is on efficient coding and optimization strategies in computationally intensive modules by exploiting fast L1/L2 memory architecture, 8-parallel data paths, enhanced direct memory access module and instruction set features of TMS320C6000 architecture. The implementation would be based on generic TMS320C6000 DSP series; the optimization techniques aimed are applicable to all types of DSP platforms. Lastly the enhanced DSP/BIOS features were used to implement a real time data handling technique to port the MELP vocoder for real time applications. Contents 1 Introduction 6 1.1 THE MELP SPEECH PRODUCTION MODEL 6 2 Development Process 29 2.1 DSP Architecture 29 2.1.1 INTRODUCTION 29 2.1.2 DSK SUPPORT TOOLS 30 2.1.3 CODE COMPOSER STUDIO 32 2.1.4 SUPPORT FILES 32 2.2 DSP Supporting Architecture 34 2.2.1 AIC-23 Codec 34 2.2.2 Multi-Channel Serial Buffered Port 43 2.2.3 Enhanced Direct Memory Access 52 2.3 Adaptation and Optimization 79...
Words: 9574 - Pages: 39
...Deep Learning more at http://ml.memect.com Contents 1 Artificial neural network 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Improvements since 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.1 Network function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.3 Learning paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.4 Learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Employing artificial neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5.1 Real-life applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5.2 Neural networks and neuroscience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.6 Neural network software ...
Words: 55759 - Pages: 224
...research and systems available, fraudsters have been able to outsmart and deceive the banks and their customers. With this in mind, we intend to introduce a novel and multi-purpose technology known as Stream Computing, as the basis for a Fraud Detection solution. Indeed, we believe that this architecture will stimulate research, and more importantly organizations, to invest in Analytics and Statistical Fraud-Scoring to be used in conjunction with the already in-place preventive techniques. Therefore, in this research we explore different strategies to build a Streambased Fraud Detection solution, using advanced Data Mining Algorithms and Statistical Analysis, and show how they lead to increased accuracy in the detection of fraud by at least 78% in our reference dataset. We also discuss how a combination of these strategies can be embedded in a Stream-based application to detect fraud in real-time. From this perspective, our experiments lead to an average processing time of 111,702ms per transaction, while strategies to further improve the performance are discussed. Keywords: Fraud Detection, Stream Computing, Real-Time Analysis, Fraud, Data Mining, Retail Banking Industry, Data Preprocessing, Data Classification, Behavior-based Models, Supervised Analysis, Semi-supervised Analysis Sammanfattning Privatbankerna har drabbats hårt av bedrägerier de senaste åren. Bedragare har lyckats kringgå forskning och tillgängliga system och lura bankerna och deras kunder. Därför vill vi införa...
Words: 56858 - Pages: 228