A Course in In-Memory Data Management

A Course in In-Memory Data Management

Author: Hasso Plattner

Publisher: Springer

Published: 2014-05-28

Total Pages: 315

ISBN-13: 3642552706

DOWNLOAD EBOOK

Recent achievements in hardware and software development, such as multi-core CPUs and DRAM capacities of multiple terabytes per server, enabled the introduction of a revolutionary technology: in-memory data management. This technology supports the flexible and extremely fast analysis of massive amounts of enterprise data. Professor Hasso Plattner and his research group at the Hasso Plattner Institute in Potsdam, Germany, have been investigating and teaching the corresponding concepts and their adoption in the software industry for years. This book is based on an online course that was first launched in autumn 2012 with more than 13,000 enrolled students and marked the successful starting point of the openHPI e-learning platform. The course is mainly designed for students of computer science, software engineering, and IT related subjects, but addresses business experts, software developers, technology experts, and IT analysts alike. Plattner and his group focus on exploring the inner mechanics of a column-oriented dictionary-encoded in-memory database. Covered topics include - amongst others - physical data storage and access, basic database operators, compression mechanisms, and parallel join algorithms. Beyond that, implications for future enterprise applications and their development are discussed. Step by step, readers will understand the radical differences and advantages of the new technology over traditional row-oriented, disk-based databases. In this completely revised 2nd edition, we incorporate the feedback of thousands of course participants on openHPI and take into account latest advancements in hard- and software. Improved figures, explanations, and examples further ease the understanding of the concepts presented. We introduce advanced data management techniques such as transparent aggregate caches and provide new showcases that demonstrate the potential of in-memory databases for two diverse industries: retail and life sciences.


Book Synopsis A Course in In-Memory Data Management by : Hasso Plattner

Download or read book A Course in In-Memory Data Management written by Hasso Plattner and published by Springer. This book was released on 2014-05-28 with total page 315 pages. Available in PDF, EPUB and Kindle. Book excerpt: Recent achievements in hardware and software development, such as multi-core CPUs and DRAM capacities of multiple terabytes per server, enabled the introduction of a revolutionary technology: in-memory data management. This technology supports the flexible and extremely fast analysis of massive amounts of enterprise data. Professor Hasso Plattner and his research group at the Hasso Plattner Institute in Potsdam, Germany, have been investigating and teaching the corresponding concepts and their adoption in the software industry for years. This book is based on an online course that was first launched in autumn 2012 with more than 13,000 enrolled students and marked the successful starting point of the openHPI e-learning platform. The course is mainly designed for students of computer science, software engineering, and IT related subjects, but addresses business experts, software developers, technology experts, and IT analysts alike. Plattner and his group focus on exploring the inner mechanics of a column-oriented dictionary-encoded in-memory database. Covered topics include - amongst others - physical data storage and access, basic database operators, compression mechanisms, and parallel join algorithms. Beyond that, implications for future enterprise applications and their development are discussed. Step by step, readers will understand the radical differences and advantages of the new technology over traditional row-oriented, disk-based databases. In this completely revised 2nd edition, we incorporate the feedback of thousands of course participants on openHPI and take into account latest advancements in hard- and software. Improved figures, explanations, and examples further ease the understanding of the concepts presented. We introduce advanced data management techniques such as transparent aggregate caches and provide new showcases that demonstrate the potential of in-memory databases for two diverse industries: retail and life sciences.


In-Memory Data Management

In-Memory Data Management

Author: Hasso Plattner

Publisher: Springer Science & Business Media

Published: 2011-03-08

Total Pages: 245

ISBN-13: 3642193633

DOWNLOAD EBOOK

In the last 50 years the world has been completely transformed through the use of IT. We have now reached a new inflection point. Here we present, for the first time, how in-memory computing is changing the way businesses are run. Today, enterprise data is split into separate databases for performance reasons. Analytical data resides in warehouses, synchronized periodically with transactional systems. This separation makes flexible, real-time reporting on current data impossible. Multi-core CPUs, large main memories, cloud computing and powerful mobile devices are serving as the foundation for the transition of enterprises away from this restrictive model. We describe techniques that allow analytical and transactional processing at the speed of thought and enable new ways of doing business. The book is intended for university students, IT-professionals and IT-managers, but also for senior management who wish to create new business processes by leveraging in-memory computing.


Book Synopsis In-Memory Data Management by : Hasso Plattner

Download or read book In-Memory Data Management written by Hasso Plattner and published by Springer Science & Business Media. This book was released on 2011-03-08 with total page 245 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the last 50 years the world has been completely transformed through the use of IT. We have now reached a new inflection point. Here we present, for the first time, how in-memory computing is changing the way businesses are run. Today, enterprise data is split into separate databases for performance reasons. Analytical data resides in warehouses, synchronized periodically with transactional systems. This separation makes flexible, real-time reporting on current data impossible. Multi-core CPUs, large main memories, cloud computing and powerful mobile devices are serving as the foundation for the transition of enterprises away from this restrictive model. We describe techniques that allow analytical and transactional processing at the speed of thought and enable new ways of doing business. The book is intended for university students, IT-professionals and IT-managers, but also for senior management who wish to create new business processes by leveraging in-memory computing.


In-Memory Data Management

In-Memory Data Management

Author: Hasso Plattner

Publisher: Springer Science & Business Media

Published: 2012-05-14

Total Pages: 286

ISBN-13: 3642295746

DOWNLOAD EBOOK

This book examines for the first time, the ways that in-memory computing is changing the way businesses are run. The authors describe techniques that allow analytical and transactional processing at the speed of thought and enable new ways of doing business.


Book Synopsis In-Memory Data Management by : Hasso Plattner

Download or read book In-Memory Data Management written by Hasso Plattner and published by Springer Science & Business Media. This book was released on 2012-05-14 with total page 286 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book examines for the first time, the ways that in-memory computing is changing the way businesses are run. The authors describe techniques that allow analytical and transactional processing at the speed of thought and enable new ways of doing business.


Database Internals

Database Internals

Author: Alex Petrov

Publisher: O'Reilly Media

Published: 2019-09-13

Total Pages: 373

ISBN-13: 1492040312

DOWNLOAD EBOOK

When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals. Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed. This book examines: Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable Log Structured storage engines, with differences and use-cases for each Storage building blocks: Learn how database files are organized to build efficient storage, using auxiliary data structures such as Page Cache, Buffer Pool and Write-Ahead Log Distributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns Database clusters: Which consistency models are commonly used by modern databases and how distributed storage systems achieve consistency


Book Synopsis Database Internals by : Alex Petrov

Download or read book Database Internals written by Alex Petrov and published by O'Reilly Media. This book was released on 2019-09-13 with total page 373 pages. Available in PDF, EPUB and Kindle. Book excerpt: When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals. Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed. This book examines: Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable Log Structured storage engines, with differences and use-cases for each Storage building blocks: Learn how database files are organized to build efficient storage, using auxiliary data structures such as Page Cache, Buffer Pool and Write-Ahead Log Distributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns Database clusters: Which consistency models are commonly used by modern databases and how distributed storage systems achieve consistency


Building a Columnar Database on RAMCloud

Building a Columnar Database on RAMCloud

Author: Christian Tinnefeld

Publisher: Springer

Published: 2015-07-07

Total Pages: 130

ISBN-13: 3319207113

DOWNLOAD EBOOK

This book examines the field of parallel database management systems and illustrates the great variety of solutions based on a shared-storage or a shared-nothing architecture. Constantly dropping memory prices and the desire to operate with low-latency responses on large sets of data paved the way for main memory-based parallel database management systems. However, this area is currently dominated by the shared-nothing approach in order to preserve the in-memory performance advantage by processing data locally on each server. The main argument this book makes is that such an unilateral development will cease due to the combination of the following three trends: a) Today’s network technology features remote direct memory access (RDMA) and narrows the performance gap between accessing main memory on a server and of a remote server to and even below a single order of magnitude. b) Modern storage systems scale gracefully, are elastic and provide high-availability. c) A modern storage system such as Stanford’s RAM Cloud even keeps all data resident in the main memory. Exploiting these characteristics in the context of a main memory-based parallel database management system is desirable. The book demonstrates that the advent of RDMA-enabled network technology makes the creation of a parallel main memory DBMS based on a shared-storage approach feasible.


Book Synopsis Building a Columnar Database on RAMCloud by : Christian Tinnefeld

Download or read book Building a Columnar Database on RAMCloud written by Christian Tinnefeld and published by Springer. This book was released on 2015-07-07 with total page 130 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book examines the field of parallel database management systems and illustrates the great variety of solutions based on a shared-storage or a shared-nothing architecture. Constantly dropping memory prices and the desire to operate with low-latency responses on large sets of data paved the way for main memory-based parallel database management systems. However, this area is currently dominated by the shared-nothing approach in order to preserve the in-memory performance advantage by processing data locally on each server. The main argument this book makes is that such an unilateral development will cease due to the combination of the following three trends: a) Today’s network technology features remote direct memory access (RDMA) and narrows the performance gap between accessing main memory on a server and of a remote server to and even below a single order of magnitude. b) Modern storage systems scale gracefully, are elastic and provide high-availability. c) A modern storage system such as Stanford’s RAM Cloud even keeps all data resident in the main memory. Exploiting these characteristics in the context of a main memory-based parallel database management system is desirable. The book demonstrates that the advent of RDMA-enabled network technology makes the creation of a parallel main memory DBMS based on a shared-storage approach feasible.


Fast and Scalable Cloud Data Management

Fast and Scalable Cloud Data Management

Author: Felix Gessert

Publisher: Springer Nature

Published: 2020-05-15

Total Pages: 199

ISBN-13: 3030435067

DOWNLOAD EBOOK

The unprecedented scale at which data is both produced and consumed today has generated a large demand for scalable data management solutions facilitating fast access from all over the world. As one consequence, a plethora of non-relational, distributed NoSQL database systems have risen in recent years and today’s data management system landscape has thus become somewhat hard to overlook. As another consequence, complex polyglot designs and elaborate schemes for data distribution and delivery have become the norm for building applications that connect users and organizations across the globe – but choosing the right combination of systems for a given use case has become increasingly difficult as well. To help practitioners stay on top of that challenge, this book presents a comprehensive overview and classification of the current system landscape in cloud data management as well as a survey of the state-of-the-art approaches for efficient data distribution and delivery to end-user devices. The topics covered thus range from NoSQL storage systems and polyglot architectures (backend) over distributed transactions and Web caching (network) to data access and rendering performance in the client (end-user). By distinguishing popular data management systems by data model, consistency guarantees, and other dimensions of interest, this book provides an abstract framework for reasoning about the overall design space and the individual positions claimed by each of the systems therein. Building on this classification, this book further presents an application-driven decision guidance tool that breaks the process of choosing a set of viable system candidates for a given application scenario down into a straightforward decision tree.


Book Synopsis Fast and Scalable Cloud Data Management by : Felix Gessert

Download or read book Fast and Scalable Cloud Data Management written by Felix Gessert and published by Springer Nature. This book was released on 2020-05-15 with total page 199 pages. Available in PDF, EPUB and Kindle. Book excerpt: The unprecedented scale at which data is both produced and consumed today has generated a large demand for scalable data management solutions facilitating fast access from all over the world. As one consequence, a plethora of non-relational, distributed NoSQL database systems have risen in recent years and today’s data management system landscape has thus become somewhat hard to overlook. As another consequence, complex polyglot designs and elaborate schemes for data distribution and delivery have become the norm for building applications that connect users and organizations across the globe – but choosing the right combination of systems for a given use case has become increasingly difficult as well. To help practitioners stay on top of that challenge, this book presents a comprehensive overview and classification of the current system landscape in cloud data management as well as a survey of the state-of-the-art approaches for efficient data distribution and delivery to end-user devices. The topics covered thus range from NoSQL storage systems and polyglot architectures (backend) over distributed transactions and Web caching (network) to data access and rendering performance in the client (end-user). By distinguishing popular data management systems by data model, consistency guarantees, and other dimensions of interest, this book provides an abstract framework for reasoning about the overall design space and the individual positions claimed by each of the systems therein. Building on this classification, this book further presents an application-driven decision guidance tool that breaks the process of choosing a set of viable system candidates for a given application scenario down into a straightforward decision tree.


Non-Volatile Memory Database Management Systems

Non-Volatile Memory Database Management Systems

Author: Joy Arulraj

Publisher: Morgan & Claypool Publishers

Published: 2019-02-12

Total Pages: 193

ISBN-13: 1681734850

DOWNLOAD EBOOK

This book explores the implications of non-volatile memory (NVM) for database management systems (DBMSs). The advent of NVM will fundamentally change the dichotomy between volatile memory and durable storage in DBMSs. These new NVM devices are almost as fast as volatile memory, but all writes to them are persistent even after power loss. Existing DBMSs are unable to take full advantage of this technology because their internal architectures are predicated on the assumption that memory is volatile. With NVM, many of the components of legacy DBMSs are unnecessary and will degrade the performance of data-intensive applications. We present the design and implementation of DBMS architectures that are explicitly tailored for NVM. The book focuses on three aspects of a DBMS: (1) logging and recovery, (2) storage and buffer management, and (3) indexing. First, we present a logging and recovery protocol that enables the DBMS to support near-instantaneous recovery. Second, we propose a storage engine architecture and buffer management policy that leverages the durability and byte-addressability properties of NVM to reduce data duplication and data migration. Third, the book presents the design of a range index tailored for NVM that is latch-free yet simple to implement. All together, the work described in this book illustrates that rethinking the fundamental algorithms and data structures employed in a DBMS for NVM improves performance and availability, reduces operational cost, and simplifies software development.


Book Synopsis Non-Volatile Memory Database Management Systems by : Joy Arulraj

Download or read book Non-Volatile Memory Database Management Systems written by Joy Arulraj and published by Morgan & Claypool Publishers. This book was released on 2019-02-12 with total page 193 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book explores the implications of non-volatile memory (NVM) for database management systems (DBMSs). The advent of NVM will fundamentally change the dichotomy between volatile memory and durable storage in DBMSs. These new NVM devices are almost as fast as volatile memory, but all writes to them are persistent even after power loss. Existing DBMSs are unable to take full advantage of this technology because their internal architectures are predicated on the assumption that memory is volatile. With NVM, many of the components of legacy DBMSs are unnecessary and will degrade the performance of data-intensive applications. We present the design and implementation of DBMS architectures that are explicitly tailored for NVM. The book focuses on three aspects of a DBMS: (1) logging and recovery, (2) storage and buffer management, and (3) indexing. First, we present a logging and recovery protocol that enables the DBMS to support near-instantaneous recovery. Second, we propose a storage engine architecture and buffer management policy that leverages the durability and byte-addressability properties of NVM to reduce data duplication and data migration. Third, the book presents the design of a range index tailored for NVM that is latch-free yet simple to implement. All together, the work described in this book illustrates that rethinking the fundamental algorithms and data structures employed in a DBMS for NVM improves performance and availability, reduces operational cost, and simplifies software development.


DAMA-DMBOK

DAMA-DMBOK

Author: Dama International

Publisher:

Published: 2017

Total Pages: 628

ISBN-13: 9781634622349

DOWNLOAD EBOOK

Defining a set of guiding principles for data management and describing how these principles can be applied within data management functional areas; Providing a functional framework for the implementation of enterprise data management practices; including widely adopted practices, methods and techniques, functions, roles, deliverables and metrics; Establishing a common vocabulary for data management concepts and serving as the basis for best practices for data management professionals. DAMA-DMBOK2 provides data management and IT professionals, executives, knowledge workers, educators, and researchers with a framework to manage their data and mature their information infrastructure, based on these principles: Data is an asset with unique properties; The value of data can be and should be expressed in economic terms; Managing data means managing the quality of data; It takes metadata to manage data; It takes planning to manage data; Data management is cross-functional and requires a range of skills and expertise; Data management requires an enterprise perspective; Data management must account for a range of perspectives; Data management is data lifecycle management; Different types of data have different lifecycle requirements; Managing data includes managing risks associated with data; Data management requirements must drive information technology decisions; Effective data management requires leadership commitment.


Book Synopsis DAMA-DMBOK by : Dama International

Download or read book DAMA-DMBOK written by Dama International and published by . This book was released on 2017 with total page 628 pages. Available in PDF, EPUB and Kindle. Book excerpt: Defining a set of guiding principles for data management and describing how these principles can be applied within data management functional areas; Providing a functional framework for the implementation of enterprise data management practices; including widely adopted practices, methods and techniques, functions, roles, deliverables and metrics; Establishing a common vocabulary for data management concepts and serving as the basis for best practices for data management professionals. DAMA-DMBOK2 provides data management and IT professionals, executives, knowledge workers, educators, and researchers with a framework to manage their data and mature their information infrastructure, based on these principles: Data is an asset with unique properties; The value of data can be and should be expressed in economic terms; Managing data means managing the quality of data; It takes metadata to manage data; It takes planning to manage data; Data management is cross-functional and requires a range of skills and expertise; Data management requires an enterprise perspective; Data management must account for a range of perspectives; Data management is data lifecycle management; Different types of data have different lifecycle requirements; Managing data includes managing risks associated with data; Data management requirements must drive information technology decisions; Effective data management requires leadership commitment.


High-Performance In-Memory Genome Data Analysis

High-Performance In-Memory Genome Data Analysis

Author: Hasso Plattner

Publisher: Springer Science & Business Media

Published: 2013-11-19

Total Pages: 239

ISBN-13: 3319030353

DOWNLOAD EBOOK

Recent achievements in hardware and software developments have enabled the introduction of a revolutionary technology: in-memory data management. This technology supports the flexible and extremely fast analysis of massive amounts of data, such as diagnoses, therapies, and human genome data. This book shares the latest research results of applying in-memory data management to personalized medicine, changing it from computational possibility to clinical reality. The authors provide details on innovative approaches to enabling the processing, combination, and analysis of relevant data in real-time. The book bridges the gap between medical experts, such as physicians, clinicians, and biological researchers, and technology experts, such as software developers, database specialists, and statisticians. Topics covered in this book include - amongst others - modeling of genome data processing and analysis pipelines, high-throughput data processing, exchange of sensitive data and protection of intellectual property. Beyond that, it shares insights on research prototypes for the analysis of patient cohorts, topology analysis of biological pathways, and combined search in structured and unstructured medical data, and outlines completely new processes that have now become possible due to interactive data analyses.


Book Synopsis High-Performance In-Memory Genome Data Analysis by : Hasso Plattner

Download or read book High-Performance In-Memory Genome Data Analysis written by Hasso Plattner and published by Springer Science & Business Media. This book was released on 2013-11-19 with total page 239 pages. Available in PDF, EPUB and Kindle. Book excerpt: Recent achievements in hardware and software developments have enabled the introduction of a revolutionary technology: in-memory data management. This technology supports the flexible and extremely fast analysis of massive amounts of data, such as diagnoses, therapies, and human genome data. This book shares the latest research results of applying in-memory data management to personalized medicine, changing it from computational possibility to clinical reality. The authors provide details on innovative approaches to enabling the processing, combination, and analysis of relevant data in real-time. The book bridges the gap between medical experts, such as physicians, clinicians, and biological researchers, and technology experts, such as software developers, database specialists, and statisticians. Topics covered in this book include - amongst others - modeling of genome data processing and analysis pipelines, high-throughput data processing, exchange of sensitive data and protection of intellectual property. Beyond that, it shares insights on research prototypes for the analysis of patient cohorts, topology analysis of biological pathways, and combined search in structured and unstructured medical data, and outlines completely new processes that have now become possible due to interactive data analyses.


Data Analytics with Hadoop

Data Analytics with Hadoop

Author: Benjamin Bengfort

Publisher: "O'Reilly Media, Inc."

Published: 2016-06

Total Pages: 288

ISBN-13: 1491913762

DOWNLOAD EBOOK

Ready to use statistical and machine-learning techniques across large data sets? This practical guide shows you why the Hadoop ecosystem is perfect for the job. Instead of deployment, operations, or software development usually associated with distributed computing, you’ll focus on particular analyses you can build, the data warehousing techniques that Hadoop provides, and higher order data workflows this framework can produce. Data scientists and analysts will learn how to perform a wide range of techniques, from writing MapReduce and Spark applications with Python to using advanced modeling and data management with Spark MLlib, Hive, and HBase. You’ll also learn about the analytical processes and data systems available to build and empower data products that can handle—and actually require—huge amounts of data. Understand core concepts behind Hadoop and cluster computing Use design patterns and parallel analytical algorithms to create distributed data analysis jobs Learn about data management, mining, and warehousing in a distributed context using Apache Hive and HBase Use Sqoop and Apache Flume to ingest data from relational databases Program complex Hadoop and Spark applications with Apache Pig and Spark DataFrames Perform machine learning techniques such as classification, clustering, and collaborative filtering with Spark’s MLlib


Book Synopsis Data Analytics with Hadoop by : Benjamin Bengfort

Download or read book Data Analytics with Hadoop written by Benjamin Bengfort and published by "O'Reilly Media, Inc.". This book was released on 2016-06 with total page 288 pages. Available in PDF, EPUB and Kindle. Book excerpt: Ready to use statistical and machine-learning techniques across large data sets? This practical guide shows you why the Hadoop ecosystem is perfect for the job. Instead of deployment, operations, or software development usually associated with distributed computing, you’ll focus on particular analyses you can build, the data warehousing techniques that Hadoop provides, and higher order data workflows this framework can produce. Data scientists and analysts will learn how to perform a wide range of techniques, from writing MapReduce and Spark applications with Python to using advanced modeling and data management with Spark MLlib, Hive, and HBase. You’ll also learn about the analytical processes and data systems available to build and empower data products that can handle—and actually require—huge amounts of data. Understand core concepts behind Hadoop and cluster computing Use design patterns and parallel analytical algorithms to create distributed data analysis jobs Learn about data management, mining, and warehousing in a distributed context using Apache Hive and HBase Use Sqoop and Apache Flume to ingest data from relational databases Program complex Hadoop and Spark applications with Apache Pig and Spark DataFrames Perform machine learning techniques such as classification, clustering, and collaborative filtering with Spark’s MLlib