ETL NoSQL Best Practices

Mongo, Redis, Elasticsearch or Couch

Legacy ETL tools are problematic for NoSQL databases. They assume tabular data and are not well suited for hierarchical Json data typically used in NoSQL databases. They also assume batch processing and JDBC style connection. ETL for NoSQL database use cases likely require a restful interface and streaming for low latency. The worst part is legacy ETL tools have fixed schema binding which negates the schema-less advantageous of NoSQL. The goal for ETL NoSQL is to keep the flexibility of schema-less NoSQL database while still being able to manage the schema for data integration with minimal effort.

METL is a powerful ETL NoSQL tool addresses these issues. It uses Json for all internal processing.  METL can read or write to nearly any datasource including SQL, NoSQL , flat files or REST. When reading from non-Json sources, Intelligent Integration immediately converts the data into Json for processing.

Being a declarative programming paradigm, Intelligent Integration separates the “what to do” from the “how to do it”.  The “what” is business requirements of data mappings, data integration and data modeling. The “how” is more technical requirements of moving data from point A to point B like error management, batch management, data cleansing etc. The “how” portion is either already implemented by Intelligent Integration’s code or is pre-configured during setup.

The “what to do” is metadata organized in a user friendly data dictionary format. This brings visibility and understanding without necessarily needing to know or understand all the ETL’s technical details. Being in a single repository allows management at the data model level and to make changes en mass.

  • All mappings and transformations are defined in simple metadata
  • Intelligent Integration’s metadata can be created surprisingly quickly from various means. This includes automatically/dynamically from incoming Json document’s schema or imported from an existing database’s schema, Master Data Management, ERD tool or ad hoc.
  • Intelligent Integration will add new elements to metadata automatically including the datatype and apply data transformation based on rules.
  • Flat file loads/exports are automated by reading column headers and auto-mapping. Drop a file in a folder and Intelligent Integration will read and map based on your rules.
  • Near real-time or batch. Intelligent Integration includes built in job and batch controls with audit logs or low latency as a Window’s service/Linux deamon
  • Endpoint on a enterprise service bus
  • Intelligent Integration is an object oriented Java app that separates the input of data (Extract) from the mapping/transformation/write (Transform and Load). So it is very flexible about where data comes from.
  • Built using Java and Json, Intelligent Integration is an easily extensible Json ETL tool.
  • Merge data from separate sources into a single Json destination. For example merge multiple normalized SQL tables with multiple cardinalities together into a hierarchical Json document.
  • Data type detection and/or validation with error logging
  • Extremely configurable with metadata managed workflow
  • Json ETL has the flexibility, code reuse and agile development of object oriented programming
  • Any source or destination: SQL, NoSQL, Hadoop, Flat File, REST or Enterprise Service Bus
  • Enterprise Data Governance: Monitor & manage data propagation. ETL “code” is a black box to DBAs
  • Automatic schema generation at destination: SQL, Flat File and obviously NoSQL
  • Scalable: Multi-threaded reads or writes and across servers/nodes
  • Intelligent Integration Json ETL has powerful Data Transformations:
    Split, Merge, Pivot, Pivot and Log, Look Up (cached, non-cached or REST call), Computed Columns/Functions, Filtering, Multi-tenant destination, Obfuscate Personnel Identifiable Information,
    Json Hierarchical Merge (typically several relational source tables combined into a single Json document)
  • Many transforms can be implemented by a single mouse click and/or manipulated through ad-hoc SQL.

ETL for NoSQL databases For additional information