Data Integration Tool

November 22, 2017

Data Integration Tool

data integration tools is a software to enable construction/implementation of data access and data delivery infrastructure for variety of data integration scenarios like:

Data acquisition for business intelligence (BI), analytics and data warehousing:Extracting data from operational systems, transforming and merging that data, and delivering it to integrated data structures for analytics. Eg: Data such as non-relational and Hadoop distributions for supporting discovery, predictive modeling, in-memory DBMS's, logical data warehouse architectures and end-user capability to integrate data and demand to integrate machine data and support Internet of Things (IoT).
Sourcing and delivery of application and master data in support of application data management and master data management (MDM)
Data consistency between operational applications: for example, involving data structures for SaaS applications or cloud-resident data sources in a bidirectional or unidirectional manner.
Interenterprise data sharing
Populating and managing data in a data lake: data is continuously collected and stored in a semantically consistent approach similar to a traditional DBMS, where data processing efforts will refine the semantics of a nontraditional DBMS. Need for integrating nonrelational structures and distributing computing workloads to parallelized processes (such as in Hadoop and alternative NoSQL repositories)
Data migration: moving data to a new platform/update of an existing data management platform/ moving data from one application to a new application/ upgraded version of an application.

IBM offers the following data integration products:

IBM InfoSphere Information Server Enterprise Edition
IBM InfoSphere Information Server Enterprise Hypervisor Edition
IBM InfoSphere Federation Server
IBM InfoSphere Data Replication
IBM Data Integration for Enterprise
IBM Data Integration for Hadoop
IBM Big Insights BigIntegrate
IBM Streams
IBM Bluemix Data Connect (previously DataWorks)

Informatica offers the following data integration products:

Informatica(PowerCenter, PowerExchange, DataReplication, Adv Data Transformation
Ultra Messaging, B2B Data Transformation, B2B Data Exchange, Data Integration Hub)
Informatica Data Services
Informatica Intelligent Cloud Services
Cloud Integration Hub
Big Data Management
Big Data Integration Hub
Informatica Intelligent Streaming
Informatica Intelligent Data Lake and Informatica Data Preparation

Oracle offers the following data integration products:

Oracle Data Integrator (ODI)
Oracle Data Integrator Cloud Service
Oracle GoldenGate
Oracle GoldenGate Cloud Service
Oracle Data Service Integrator and Oracle Service Bus

SAP offers the following data integration products:

SAP Data Services
SAP Replication Server
SAP Landscape Transformation Replication Server
SAP Remote Data Sync
SAP Data Hub
SAP Hana platform
SAP Cloud Platform Integration
SAP Event Stream Processor

Talend offers

Talend Open Studio
Talend Data Fabric
Talend Data Management Platform
Talend Platform for Big Data
Talend Data Services Platform
Talend Integration Cloud
Talend Data Preparation

Functional Criteria's of DI Tool:

1. Connectivity/adapter capabilities: Data source and target support with ability to interact with a range of different types of data structure, such as:

Relational databases
Legacy and nonrelational databases
Various file formats
XML
Packaged applications, such as those for (CRM) and SCM
SaaS and cloud-based applications and sources
Industry-standard message formats, such as electronic data interchange (EDI), Health Level Seven International (HL7) and Society for Worldwide Interbank Financial Telecommunication (SWIFT)
Parallel distributed processing environments such as Hadoop Distributed File System (HDFS) and other nonrelational-type repositories such as graph, table-style, document store and key-value DBMSs
Message queues, including those provided by application integration middleware products and standards-based products (such as Java Message Service)
Data types of a less-structured nature, such as those associated with social media, web clickstreams, email, websites, office productivity tools and content
Emergent sources, such as data on in-memory repositories, mobile platforms and spatial applications
Screen-scraping and/or user interaction simulations (for example, scripts to interact with the web, 3270 or VT100 terminals, and others)

2. Different modes of interaction with range of data structure types, including:

Bulk/batch acquisition and delivery
Granular trickle-feed acquisition and delivery
Change data capture (CDC) — the ability to identify and extract modified data
Event-based acquisition (time-based, data-value-based or links to application integration tools to interact with message request/reply, publish/subscribe and routing)
Data delivery capabilities. The ability to provide data to consuming applications, processes and databases in a variety of modes, including:
Physical bulk/batch data movement between data repositories, such as processes for ETL or extraction, loading and transformation (ELT)
Data virtualization
Message-oriented encapsulation and movement of data (via linkage with application integration tool capability)
Data synchronization when distributed datasets must resolve data collisions resulting from distinct changes in disparate copies of data to retain data consistency
Replication of data between homogeneous or heterogeneous DBMSs and schemas
Migration of data across versions of data repositories (such as databases, file systems, and so on) and applications (resolving logical differences to achieve physical migration)

3. Latency requirements

Scheduled batch delivery
Streaming/near-real-time delivery
Event-driven delivery of data based on identification of a relevant event
Data transformation capabilities. Built-in capabilities for achieving data transformation operations of varying complexity, including:
Basic transformations, such as data-type conversions, string manipulations and simple calculations
Transformations of intermediate complexity, such as look-up and replace operations, aggregations, summarizations, integrated time series, deterministic matching and the management of slowly changing dimensions
Complex transformations, such as sophisticated parsing operations on free-form text, rich media and patterns/events in big data

4. Metadata and data modeling support. which include:

Automated discovery/acquisition of metadata from data sources, applications and other tools
Discernment of relationships between data models and business process models
Data model creation and maintenance
Physical-to-logical model mapping and rationalization
Ability to define model-to-model relationships via graphical attribute-level mapping
Lineage and impact analysis reporting, in graphical and tabular formats
An open metadata repository, with the ability to share metadata bidirectionally with other tools
Automated synchronization of metadata across multiple instances of the tools
Ability to extend metadata repository with customer-defined metadata attributes &relationships
Documentation of project/program delivery definitions and design principles in support of requirements definition activities
A business analyst/end-user interface to view and work with metadata
Design and development environment capabilities. Facilities for enabling the specification and construction of data integration processes, including:
Graphical representation of repository objects, data models and data flows
Management of the development process workflow, addressing requirements such as approvals and promotions
Granular, role-based and developer-based security
Team-based development capabilities, such as version control and collaboration
Functionality to support reuse across developers and projects, and to facilitate the identification of redundancies
A common or shared user interface for design and development (of diverse data delivery styles, data integration and data quality operations, cloud and on-premises environments, and so on)
A business analyst/end-user interface to specify and manage mapping and transformation logic through the use of end-user functionality for data integration/preparation
Support for testing and debugging
Information governance support capabilities (via interoperation with data quality, profiling and mining capabilities with the vendor's or a third party's tools). Mechanisms to work with related capabilities to help with the understanding and assurance of data quality over time, including interoperability with:
Data profiling tools (profiling and monitoring the conditions of data quality)
Data mining tools (relationship discovery)
Data quality tools (supporting data quality improvements)
In-line scoring and evaluation of data moving through the processes
Deployment options and runtime platform capabilities. Breadth of support for the hardware and operating systems on which data integration processes may be deployed, and the choices of delivery model — specifically:
Mainframe environments, such as IBM z/OS and z/Linux
Midrange environments, such as IBM i or Hewlett Packard Enterprise (HPE) NonStop
Unix-based environments
Windows environments
Linux environments
On-premises (at the customer site) installation and deployment of software
Hosted off-premises software deployment (dedicated, single-tenant implementation)
Integration platform as a service (iPaaS), consumed by the customer completely "as a service" — the vendor provides cloud infrastructure; the customer does not install or administer the software
Cloud deployment support (requires organizations to deploy software in a cloud infrastructure); importantly, the ability to design once but deploy across multiple or even hybrid/mixed environments, on-premises, in the cloud, or both
In-memory computing environment
Server virtualization (support for shared, virtualized implementations)
Parallel distributed processing, such as Apache Hadoop, MapReduce, or leveraging Apache Spark or Hadoop YARN (Yet Another Resource Negotiator)
Operations and administration capabilities. Facilities for enabling adequate ongoing support, management, monitoring and control of the data integration processes implemented by the tools, such as:
Error-handling functionality, both predefined and customizable
Monitoring and control of runtime processes, both via functionality in the tools and through interoperability with other IT operations technologies
Collection of runtime statistics to determine use and efficiency, as well as an application-style interface for visualization and evaluation
Security controls, for both data in-flight and administrator processes
A runtime architecture that ensures performance and scalability
Architecture and integration capabilities. The degree of commonality, consistency and interoperability between the various components of the data integration toolset, including:
A minimal number of products (ideally one) supporting all data delivery modes
Common metadata (a single repository) and/or the ability to share metadata across all components and data delivery modes
A common design environment to support all data delivery modes
The ability to switch seamlessly and transparently between delivery modes (bulk/batch versus granular real-time versus federation) with minimal rework
Interoperability with other integration tools and applications, via certified interfaces, robust APIs and links to messaging support
Efficient support for all data delivery modes, regardless of runtime architecture type (centralized server engine versus distributed runtime)
The ability to execute data integration in cloud and on-premises environments, as appropriate, where developed artifacts can be interchanged, reused and deployed across both environments with minimal rework
Service enablement capabilities. As acceptance of data service concepts continues to grow, so data integration tools must exhibit service-oriented characteristics and provide support for SOA, such as:
The ability to deploy all aspects of runtime functionality as data services (for example, deployed functionality can be called via a web services interface)
Management of publication and testing of data services
Interaction with service repositories and registries
Service enablement of development and administration environments, so that external tools and applications can dynamically modify and control the runtime behavior of the tools

Talend

Repository Tab

Context and Metadata

Headers

Mapping Data

Data Generation

Outputing tMap Rejects

Filtering Mapped Data

Mapping Database Data

Regex Expressions

XML to CSV conversion

CSV to XML conversion

Advanced XML

Joining Date

Splitting Data

Performing Advanced Data Lookups

Uniting Data

Retreating and working with File Properties

Row Count

Temp, Delete, Move, Rename

Basics tJava component

More features of tJava component

tJavaRow component

tJavaFlex component

Custom Routines

Databases in Talend

Working with Hyper SQL HSQL

Looking at Table Schemes

Importing Table Schemes

Reading Data from a Database

Writing Data to a Database

Debugging

Reject Handling

Reject Handling with tMsgbox component

Saving Console Log to a flat file

tMap Input Reject Handling

Logging Job Information

Monitoring Jobs

Building Talent Jobs

Adding context from command line

Additional context variable modifications

Scheduling Talent Jobs

Sub Jobs

Custom Components

Creating a job with tFileExcel Custom Components

Using tSystem Component

Using tSCP components

Ref: Gartner Link

Search This Blog

Business Technology

Data Integration Tool

Comments

Post a Comment

Popular Posts

Master Data Management - MDM

SAP HANA E2E