CIO Insider

CIOInsider India Magazine

Separator

Career Options In Data & AI

Separator
Hemanth Kumar,  Acuvate Software,   Practice Head Analytics and Data Management  at  Acuvate Software,

Acuvate focuses on creating business solutions that improve the quality of workplaces. Presently, the organization has served 100+ clients globally while delivering over 15 packaged IP solutions and tools, 300 SharePoint and 50 BI & analytics implementations.

With today’s technology companies can collect tremendous amounts of data with relative ease. Companies have more data than they can handle. This data however is meaning-less until they are analyzed for trends (Which product sales is slowing down or which one is consistently on the rise?), patterns relationships (Which particular product does a specific age group prefer?) and other useful information (Does the population from a certain region like a specific sport or a product?).

The decisions made based on this analysis create an impact in the company to either help increase its revenue (Top Line) or save costs to improve the profit (Bottom Line).

Our Universities tend to focus on database concepts RDBMS (Normalization, Writing queries),Big Data (Hadoop) and programming languages(C, C++, C#, Java, Python). Recently, they have started focusing on statistical tools like R and statistics capabilities of Python for Data Analysis and Data Mining. While this is a good trend in my view however there are gaps and I think the students need to be made aware of lot more possibilities and fundamentals in the Data & AI space.

Developing a model for decision making is a seven step process.
1.Define the problem which we are trying to solve.
2.Collect and Summarize Data.
3.Develop a Model based on various quantitative methods.
4.Verify the model to check if the model developed is a representation of reality.
5.Select the future course of action based on the results.
6.Present the results and recommend the action needed to the organization.
7.Implement the model and improve it over a period to add other impacting parameters.

I would like to introduce two important roles and the distinction between them and what are the aspects one need to be aware of to succeed in these roles.
1. Data Engineer
2. Data Scientist

The 2nd step which is collecting and summarizing the data is possibly one of the most critical steps and often not given the due importance. Many students are not aware of the career possibilities in this space and the tools and concepts they need to be aware to do this role and typically the industry needs to train them on these and while this a very interesting field in the area many freshers are not aware of the possibilities in this field traditionally called Business Intelligence.

The Data Scientist Needs To Have A Very Good Understanding Of Statistical Knowledge


This role typically involves a Data Engineer who uses tools to collect data from various sources. These tools are called ETL(Extract, Transform and Load) tools and stores this data in a central repository called Data Warehouse. Building this Datawarehouse requires an understanding of technical concepts like writing stored procedures, using ETL tools like SQL Server Integration Services, Informatica, Business and Dimensional Modeling process like Kimball Methodology with a good understanding of how the business works.

To become successful in this area a person needs to have an affinity to understand how a Consumer Products and Goods, Insurance or Telecom (typically called verticals ) work and how they make their decisions and also have a flair for technology and quantitative aptitude and understanding of Data warehousing methodologies like Inmon approach, Kimball Approach, Difference between Star and Snow Flake Schema , What is OLTP, OLAP, Dimensional modeling.

The second but in no way the lesser role is the Data Scientist who also needs to have a very good understanding of how these business’s work and how decisions are made. The Data scientist needs to have a very good understanding of statistical knowledge and if you have a very good flair for statistics and are able to analyze an event like a Cricket Match and predict the possibility of what is likely to happen based on the numbers alone like how many times has a certain team won against another , with what kind of player composition and in what kind of situations , of course you could predict wrong at times but if you are able to do it right majority of the times then is a career option you could consider.

To be able do well in this role you would have to concentrate on learning statistical tools like R, Python (Statistics is one of the capabilities of Python) also Excel (Though old you will be surprised how many companies still use it and is a fundamental data analysis tool) and core concepts of statistics like Mean, Median,Mode, Standard Deviation, Logistic regression, Linear regression, Linear programming to name a few, of course there are more advanced concepts like Deep Learning which is used for Vision, Natural Language processing etc.

I hope this has given you a perspective of the roles possible in the Data & AI space and for all you future engineers and managers who might come across these roles. It is an exciting and interesting field which is growing rapidly and there is a reason for the Quote “Data is the new Oil” as all companies realize the crude (Data) that they have wish to convert it into something useful which is Oil(Insights and Decisions).

Current Issue
G7  CR - Value Driven Cloud Assitance