
What is MetaCroc #
MetaCroc is metadata-driven development tool. It stores metadata about elements necessary for data manipulation in data platforms, e.g., data structures (tables), data transformations (ETLs/ELTs) or workflows.
For whom MetaCroc is #
MetaCroc is a tool designed for data engineers, i.e., specialists with (at least) this knowledge and skills:
- SQL language, i.e., they are able to construct SQL commands
- data warehousing, i.e., knowledge about the most common data architectures, design patterns for all layers of data warehouses (or data platforms in general), etc.
- ability to understand business needs related to data and ability to transform them to data structures and data sets and validate them with business users
Metadata transformation #
Metadata stored in MetaCroc are transformed to scripts for the target data platform technology. Transformations are performed via templates (Apache Freemarker (TM) is used for this purpose), templates are open and available for adjustments and changes as needed. Except DDL and DML scripts, it is possible to generate other metadata-based outputs, e.g. HTML or markdown documentation.
Data Security #
MetaCroc manages its own metadata only. It does not access any data directly so it does not need a connection to databases, etc.
Typically, structures are imported from CSV files generated from database catalog, output scripts are pushed to Git feature branches and deployed by Git (or any other) CI/CD pipeline that writes back to MetaCroc information about versions deployed to an environment. Outside MetaCroc, the development team can manage pull requests, perform peer reviews, move generated documentation to wiki, etc. MetaCroc exposes APIs for integration with other tools (e.g., Data Governance tools to publish Data Catalog for data users).
Development Lifecycle #
MetaCroc is a part of development lifecycle and it does not cover the whole one. The key purpose of MetaCroc is to enter, store and manage metadata necessary for generation of script that are needed for data structures definition, data transformation and management of ETL workflow. MetaCroc can be connected to Git for further scripts processing.
The rest of development lifecycle needs to be set up and adjusted based on specific project and team needs. CI/CD pipelines can be covered by Git itself of tools like Jenkins, peer review can be a part of pull requests, etc.
MetaCroc is open and can be integrated via API.
Supported Data Architectures #
MetaCroc is fully customizable and it can support most of data architectures and governance approaches (e.g., pure DWH, Lakehouse, both Kimball and Inmon approaches, Data Mesh, etc.)