Data Integration Tool

data integration tools is a software to enable construction/implementation of data access and data delivery infrastructure for variety of data integration scenarios like:


  • Data acquisition for business intelligence (BI), analytics and data warehousing:Extracting data from operational systems, transforming and merging that data, and delivering it to integrated data structures for analytics. Eg: Data such as non-relational and Hadoop distributions for supporting discovery, predictive modeling, in-memory DBMS's, logical data warehouse architectures and end-user capability to integrate data and demand to integrate machine data and support Internet of Things (IoT). 
  • Sourcing and delivery of application and master data in support of application data management and master data management (MDM)
  • Data consistency between operational applicationsfor example, involving data structures for SaaS applications or cloud-resident data sources in a bidirectional or unidirectional manner
  • Interenterprise data sharing
  • Populating and managing data in a data lakedata is continuously collected and stored in a semantically consistent approach similar to a traditional DBMS, where data processing efforts will refine the semantics of a nontraditional DBMS. Need for integrating nonrelational structures and distributing computing workloads to parallelized processes (such as in Hadoop and alternative NoSQL repositories) 
  • Data migrationmoving data to a new platform/update of an existing data management platform/ moving data from one application to a new application/ upgraded version of an application.

IBM offers the following data integration products: 
  • IBM InfoSphere Information Server Enterprise Edition
  • IBM InfoSphere Information Server Enterprise Hypervisor Edition
  • IBM InfoSphere Federation Server
  • IBM InfoSphere Data Replication
  • IBM Data Integration for Enterprise
  • IBM Data Integration for Hadoop
  • IBM Big Insights BigIntegrate
  • IBM Streams 
  • IBM Bluemix Data Connect (previously DataWorks)

Informatica offers the following data integration products: 

  • Informatica(PowerCenter, PowerExchange, DataReplication, Adv Data Transformation
  • Ultra Messaging, B2B Data Transformation, B2B Data Exchange, Data Integration Hub)
  • Informatica Data Services
  • Informatica Intelligent Cloud Services
  • Cloud Integration Hub
  • Big Data Management
  • Big Data Integration Hub
  • Informatica Intelligent Streaming
  • Informatica Intelligent Data Lake and Informatica Data Preparation
Oracle offers the following data integration products: 
  • Oracle Data Integrator (ODI)
  • Oracle Data Integrator Cloud Service
  • Oracle GoldenGate
  • Oracle GoldenGate Cloud Service
  • Oracle Data Service Integrator and Oracle Service Bus
SAP offers the following data integration products: 
  • SAP Data Services
  • SAP Replication Server
  • SAP Landscape Transformation Replication Server
  • SAP Remote Data Sync
  • SAP Data Hub
  • SAP Hana platform
  • SAP Cloud Platform Integration 
  • SAP Event Stream Processor
Talend offers 
  • Talend Open Studio
  • Talend Data Fabric
  • Talend Data Management Platform
  • Talend Platform for Big Data
  • Talend Data Services Platform
  • Talend Integration Cloud 
  • Talend Data Preparation

Functional Criteria's of DI Tool:

1. Connectivity/adapter capabilities: Data source and target support with ability to interact with a range of different types of data structure, such as:
  • Relational databases
  • Legacy and nonrelational databases
  • Various file formats
  • XML
  • Packaged applications, such as those for (CRM) and SCM
  • SaaS and cloud-based applications and sources
  • Industry-standard message formats, such as electronic data interchange (EDI), Health Level Seven International (HL7) and Society for Worldwide Interbank Financial Telecommunication (SWIFT)
  • Parallel distributed processing environments such as Hadoop Distributed File System (HDFS) and other nonrelational-type repositories such as graph, table-style, document store and key-value DBMSs
  • Message queues, including those provided by application integration middleware products and standards-based products (such as Java Message Service)
  • Data types of a less-structured nature, such as those associated with social media, web clickstreams, email, websites, office productivity tools and content
  • Emergent sources, such as data on in-memory repositories, mobile platforms and spatial applications
  • Screen-scraping and/or user interaction simulations (for example, scripts to interact with the web, 3270 or VT100 terminals, and others)
2. Different modes of interaction with range of data structure types, including:
  • Bulk/batch acquisition and delivery
  • Granular trickle-feed acquisition and delivery
  • Change data capture (CDC) — the ability to identify and extract modified data
  • Event-based acquisition (time-based, data-value-based or links to application integration tools to interact with message request/reply, publish/subscribe and routing)
  • Data delivery capabilities. The ability to provide data to consuming applications, processes and databases in a variety of modes, including:
  • Physical bulk/batch data movement between data repositories, such as processes for ETL or extraction, loading and transformation (ELT)
  • Data virtualization
  • Message-oriented encapsulation and movement of data (via linkage with application integration tool capability)
  • Data synchronization when distributed datasets must resolve data collisions resulting from distinct changes in disparate copies of data to retain data consistency
  • Replication of data between homogeneous or heterogeneous DBMSs and schemas
  • Migration of data across versions of data repositories (such as databases, file systems, and so on) and applications (resolving logical differences to achieve physical migration)
3. Latency requirements
  • Scheduled batch delivery
  • Streaming/near-real-time delivery
  • Event-driven delivery of data based on identification of a relevant event
  • Data transformation capabilities. Built-in capabilities for achieving data transformation operations of varying complexity, including:
  • Basic transformations, such as data-type conversions, string manipulations and simple calculations
  • Transformations of intermediate complexity, such as look-up and replace operations, aggregations, summarizations, integrated time series, deterministic matching and the management of slowly changing dimensions
  • Complex transformations, such as sophisticated parsing operations on free-form text, rich media and patterns/events in big data
4. Metadata and data modeling support. which include:
  • Automated discovery/acquisition of metadata from data sources, applications and other tools
  • Discernment of relationships between data models and business process models
  • Data model creation and maintenance
  • Physical-to-logical model mapping and rationalization
  • Ability to define model-to-model relationships via graphical attribute-level mapping
  • Lineage and impact analysis reporting, in graphical and tabular formats
  • An open metadata repository, with the ability to share metadata bidirectionally with other tools
  • Automated synchronization of metadata across multiple instances of the tools
  • Ability to extend metadata repository with customer-defined metadata attributes &relationships
  • Documentation of project/program delivery definitions and design principles in support of requirements definition activities
  • A business analyst/end-user interface to view and work with metadata
  • Design and development environment capabilities. Facilities for enabling the specification and construction of data integration processes, including:
  • Graphical representation of repository objects, data models and data flows
  • Management of the development process workflow, addressing requirements such as approvals and promotions
  • Granular, role-based and developer-based security
  • Team-based development capabilities, such as version control and collaboration
  • Functionality to support reuse across developers and projects, and to facilitate the identification of redundancies
  • A common or shared user interface for design and development (of diverse data delivery styles, data integration and data quality operations, cloud and on-premises environments, and so on)
  • A business analyst/end-user interface to specify and manage mapping and transformation logic through the use of end-user functionality for data integration/preparation
  • Support for testing and debugging
  • Information governance support capabilities (via interoperation with data quality, profiling and mining capabilities with the vendor's or a third party's tools). Mechanisms to work with related capabilities to help with the understanding and assurance of data quality over time, including interoperability with:
  • Data profiling tools (profiling and monitoring the conditions of data quality)
  • Data mining tools (relationship discovery)
  • Data quality tools (supporting data quality improvements)
  • In-line scoring and evaluation of data moving through the processes
  • Deployment options and runtime platform capabilities. Breadth of support for the hardware and operating systems on which data integration processes may be deployed, and the choices of delivery model — specifically:
  • Mainframe environments, such as IBM z/OS and z/Linux
  • Midrange environments, such as IBM i or Hewlett Packard Enterprise (HPE) NonStop
  • Unix-based environments
  • Windows environments
  • Linux environments
  • On-premises (at the customer site) installation and deployment of software
  • Hosted off-premises software deployment (dedicated, single-tenant implementation)
  • Integration platform as a service (iPaaS), consumed by the customer completely "as a service" — the vendor provides cloud infrastructure; the customer does not install or administer the software
  • Cloud deployment support (requires organizations to deploy software in a cloud infrastructure); importantly, the ability to design once but deploy across multiple or even hybrid/mixed environments, on-premises, in the cloud, or both
  • In-memory computing environment
  • Server virtualization (support for shared, virtualized implementations)
  • Parallel distributed processing, such as Apache Hadoop, MapReduce, or leveraging Apache Spark or Hadoop YARN (Yet Another Resource Negotiator)
  • Operations and administration capabilities. Facilities for enabling adequate ongoing support, management, monitoring and control of the data integration processes implemented by the tools, such as:
  • Error-handling functionality, both predefined and customizable
  • Monitoring and control of runtime processes, both via functionality in the tools and through interoperability with other IT operations technologies
  • Collection of runtime statistics to determine use and efficiency, as well as an application-style interface for visualization and evaluation
  • Security controls, for both data in-flight and administrator processes
  • A runtime architecture that ensures performance and scalability
  • Architecture and integration capabilities. The degree of commonality, consistency and interoperability between the various components of the data integration toolset, including:
  • A minimal number of products (ideally one) supporting all data delivery modes
  • Common metadata (a single repository) and/or the ability to share metadata across all components and data delivery modes
  • A common design environment to support all data delivery modes
  • The ability to switch seamlessly and transparently between delivery modes (bulk/batch versus granular real-time versus federation) with minimal rework
  • Interoperability with other integration tools and applications, via certified interfaces, robust APIs and links to messaging support
  • Efficient support for all data delivery modes, regardless of runtime architecture type (centralized server engine versus distributed runtime)
  • The ability to execute data integration in cloud and on-premises environments, as appropriate, where developed artifacts can be interchanged, reused and deployed across both environments with minimal rework
  • Service enablement capabilities. As acceptance of data service concepts continues to grow, so data integration tools must exhibit service-oriented characteristics and provide support for SOA, such as:
  • The ability to deploy all aspects of runtime functionality as data services (for example, deployed functionality can be called via a web services interface)
  • Management of publication and testing of data services
  • Interaction with service repositories and registries
  • Service enablement of development and administration environments, so that external tools and applications can dynamically modify and control the runtime behavior of the tools


Talend 



Repository Tab 

Context and Metadata 

Headers 

Mapping Data 

Data Generation 

Outputing tMap Rejects 

Filtering Mapped Data 

Mapping Database Data 

Regex Expressions 

XML to CSV conversion 

CSV to XML conversion 

Advanced XML 

Joining Date 

Splitting Data 

Performing Advanced Data Lookups 

Uniting Data 


Retreating and working with File Properties


Row Count 

Temp, Delete, Move, Rename 

Basics tJava component 

More features of tJava component 

tJavaRow component 

tJavaFlex component 

Custom Routines 

Databases in Talend 

Working with Hyper SQL HSQL 

Looking at Table Schemes 

Importing Table Schemes 

Reading Data from a Database 

Writing Data to a Database 

Debugging 

Reject Handling 

Reject Handling with tMsgbox component 

Saving Console Log to a flat file 

tMap Input Reject Handling 

Logging Job Information 

Monitoring Jobs 

Building Talent Jobs

Adding context from command line 

Additional context variable modifications 

Scheduling Talent Jobs 

Sub Jobs 

Custom Components 

Creating a job with tFileExcel Custom Components 

Using tSystem Component 

Using tSCP components




Comments

Popular Posts