CIO Insider

CIOInsider India Magazine


Datametica: Smooth Data Warehouse Migrations using Scientifically Designed Automation Frameworks

Deepak Badhani  Director & Co-founder,Niraj Kumar  Co-founder & CTO

Deepak Badhani Director & Co-founder

Niraj Kumar Co-founder & CTO

A USD 3 billion insurance company from Ohio, sought to move away from company’s legacy solution. At this juncture, reducing operating costs and improving platform performance were top priorities. Administrators also wanted to eliminate problematic data siloing to optimize network efficiency. BigQuery is the primary data lake solution for this insurance company, offering substantially faster query times for employees compared to its previous data lake solution within Netezza. For instance, an 800GB query that previously took three hours on the company's legacy solution can now be executed in seven minutes.By moving to Google Cloud Platform, the insurance giant benefited from simpler platform management resources, reduced costs, and improved data security for high-value customer information. Datametica EDW Migration Toolset made it possible to migrate just within 4 months when other service providers were quoting 12 months of time. This was a highly complex Netezza Warehouse migration. The Google Cloud Premier Services Partner Datametica played an imminent role in the data warehouse migration process to BigQuery, helping to seamlessly implement the new processes and technologies without disruption. Datametica continues to support the client in its endeavors and its move to the cloud for data management and analytics capabilities.

Pune based Datametica is adept in resolving a plethora of further challenges bugging the data driven business decisions. Datametica simplifies the complex architecture by mapping and benchmarking of requirements to cost-effective and efficient technology/cloud components, thereby eliminating the confusion due to various technology options. The use case onboarding always requires a strong foundation. Datametica’s foundation set up encapsulates provisioning of infrastructure, security, governance, data pipeline, data storage, and DevOps. After acknowledging the foundation and use case, collaboration with several third-party vendors for the various technology components involved can cause chaos. To avoid this, Datametica brings in all in a box solution that includes segregation of responsibilities for all the associated vendors and detailed communication plan to successfully implement the project.

End to End Data Warehouse Migration Implementation
Maintaining a data warehousing solution on-premise is increasingly challenging, driven by the need for storage, processing power, software, tools and operational costs. This drives the need

of moving these data centers for Cloud to address the future needs and to reduce the total cost of ownership. However, migrating a data warehouse to Cloud is not a cake walk. It requires detailed planning and a well thought out strategy to execute the migration process. Datametica is a recognized world leader in Data Warehouse migration to Cloud that deploys an innovative toolset to automate the planning, code conversion, data migration, and data validation during parallel runs. These toolsets reduce the implementation time by 50 percent and cost by 60 percent. Datametica brings in extensive production experience in moving Teradata, Netezza, Oracle, Greenplum or other data warehouses to Cloud (GCP, AWS and Azure) and on-premise Hadoop platforms. The company has achieved highest level of partnership with Google, AWS and Microsoft and works very closely with them to design, architect, build, migrate, and manage workloads.

Datametica operates through 4 solutions in the data warehouse migration offering namely – Eagle, Condor, Raven and Pelican.

I.Eagle or the Planner- Automated Assessment and Migration Planner: This was conceptualized to overcome major challenges of migration and focus on discovery, strategy and project plan. The tool gives a deep analysis and provides a drill down capability to data warehouses and related ETL workloads. Precision being its design parameter, Eagle is able to detect inefficient and unidentifiable workloads, creating a detailed and interactive experience, breaking migration into a meticulous strategy. Giving a detailed migration strategy, Eagle uses machine based algorithms to break complex warehouse systems into logical chunks. Thereby, presenting a complete sprint wise project plan.

II.Raven or the Transformer- Quick, Automated & error free conversion of workloads: This module automates the workload translation to native Cloud technologies thereby saving more than 50 percent of time as compared to traditional methods of workload transfer. And the Pelican or the Monitor assures the businesses that existing and new systems continuously match at the most granular level during the migration process. It translates legacy non supported data warehouse function to future state.

III.Condor or the Transporter- Flexible & Reliable data ingestion framework: This comes to the rescue of organizations using serverless computing wherein general ETL tools are going obsolete. Condor assists in building data pipelines and acquiring data from all sources on the Cloud. Condor is uniquely designed for configuring and ingesting source systems to the Cloud environment.

IV.Pelican or the Monitor- Automated Data Validation at Scale: The module provides a sustainable and powerful solution to data validation between on-premise and the Cloud at the file, table, row or cell-level, even with petabytes of data. Using machine learning and hashing algorithms, this tool introduces a scalable and innovative way to validate data rapidly with greatly reduced data movement.

Datametica also builds custom AI/Analytics Solutions to solve complex business needs that includes analytics of both structured data and unstructured data. Some of Datametica’s use cases in advanced analytics are award winning with accuracy rate as high as 90 percent. They vary from customer centric solutions, and social media analytics to supply chain optimization across various industries. “Datametica works on both, open source technologies and the readily available SaaS solutions on cloud (Like Azure ML, Amazon Machine Learning and Google Machine Learning)”, confirm Niraj Kumar, Co-founder & CTO at Datametica.

Leadership Team Driving Success for Datametica
Niraj Kumar is an Innovation Specialist, who is the Chief Architect for designing complex and future-proof solutions using NexGen technologies. He provides actionable insights by conducting interactive workshops and advisory services that concentrate on strategizing and architecting data management along with analytic solutions. Niraj is also a member of the Forbes Technology Council.
The company’s Director and the other Co founder, Deepak Badhani holds the global responsibility for leading the transformation of the Services organization into a more profitable, global delivery set up with happy customers. He is recognized for achieving continuous profitability goals, turn around in delivery, project management and driving increased revenues and market share.

Datametica has always aimed at building automation to enable Modern Data Platforms both on Cloud and on-premise. “This will help businesses find and capture hidden value from data through a unique blend of business acumen, big-data and machine learning.Our partnership with the likes of Google, AWS and Microsoft has led to growth both in terms of technology enablement as well as customer spread across domains”, adds Deepak. Until now, Datametica has covered a great distance with their giant strides in enabling smooth migrations. The company shows promises to maintain the pace and perfect the art.

Use cases in social media analytics:

•Rumor Detection- Designed a model that automatically assists the surveillance team of a Stock Exchange in classification of news article as rumors or non-rumors. The rumor detection model is built on machine learning algorithm and the threshold for classification is revised iteratively based on user feedback leading to an accuracy upto 92 percent.

•Video News to Text Converter- Developed a ‘Video to Text Convertor Engine (Deep Speech 2)’ for many applications like survey collection, captioning, sentiment analysis etc. Datametica team has gathered 1000 hours of speech data from CNBC TV18 news to train the designed system. All speech data were sampled at 16KHZ mono recording.

•e-Paper News Extraction– Built a rule-based engine for text extraction from image of e-newspapers.

Current Issue
Datametica: Smooth Data Ware House Migrations using  Scientifically Designed Automation Frameworks