Wednesday, 2 December 2020

Jay Oza's Questions on AI and Machine Learning

Five Questions for Dr. Umesh Rao Hodeghatta, author of "Business Analytics Using R - A Practical Approach" (

To succeed today, you have to get familiar with, and even master, artificial intelligence and machine learning. The companies that are thriving have mastered AI/ML. It has become table stakes very fast. 

Jim Cramer hosts a show on CNBC called "Mad Money" in the United States, that really should be called "Math Money." The reason for this is that he is constantly talking about companies like NVIDIA, Facebook, Amazon, and many others that have been making people mad money on Wall Street because of their mastery of data, math, and algorithms. They will continue to get bigger, cheaper, faster, better, and smarter because of their mastery of artificial intelligence and machine learning (AI/ML).  AI/ML is making companies get bigger and people a lot of money.

To better understand this difficult yet important subject, I came across an excellent article written by Dr. Umesh Hodeghatta titled "Challenges of Executing AI/Machine Learning Projects." The article lays out that there is a lot involved in successfully implementing AI/ML projects. 

Dr. Hodeghatta is a leading practitioner and thought leader in the field of AI/Machine Learning. He has authored two technical books: The Infosec Handbook: An Introduction to Information Security and Business Analytics Using R - A Practical Approach. He speaks about AI/ML at conferences all over the world. He has done over twenty webinars on BrightTALK. He has taught AI/ML at universities in both the United States and abroad. He is highly sought after for consulting by Fortune 500 companies. He is a co-founder of Nu-Sigma, an AI/ML solutions provider. So he is the right person to ask questions about AI/ML that many of you have, whether you work as an individual contributor or as a C-Level executive.

Question 1
Can small companies take on AI projects? What is the approach you recommend
they use?

Yes, small companies can take on AI projects. Most of the small companies may have good quality data and may not know how to use it to drive some business problems.

For example, a small solar light manufacturing company with 200-300 people may want to optimize its production line. They may have collected lots of data over a period and may have zeroed in on the bottleneck in their process. But they may not have an IT Technical team who are well versed with data, AI, or Machine Learning. Hence, they are not aware of how the data and AI can be applied to solve the problem in their process.

It is not that we always require huge data for building models or running AI projects. It is the quality of data that matters in most cases. Some small companies may definitely have a good quality of data. Our recommendation to such small companies is to engage companies like NUSA Lab, who can help them deploy AI models to solve their business problems. If they do not have relevant data, we can work with them to collect such data required to train the AI model.

Question 2
What do CEOs need to know before they embark on an AI/ML project so they can succeed?  

CEOs need to have the following information before they embark on an AI/ML project:
  • Clarity on the problem that is relevant to the business
  • Understand the value they expect to gain from AI. Is it optimization, reducing cost, or improve customer satisfaction?
  • Understand the reason for executing the AI project, whether to really solve the problem or explore new technology?

Understanding whether AI is a disruptive technology in your industry and how AI can be
leveraged will be the critical skills for successful CEOs both now and in the future

Question 3
What methodology is used in successful AI/ML projects?

NUSA Lab (N-U Sigma U-Square Analytics Lab LLP) has developed a framework that is proven to be effective. Here is our framework:

We start with the data and find out the patterns or problems/issues/challenges where data is
available. Else, we start with the business problem/issue/challenge in discussion with the organization. In the second scenario, we suggest the data to be collected if available or partially available. We then analyze the data, understand the data, and clean up the data using various relevant methods. 

Once the cleaned data is available, we determine a suitable algorithm to be applied to the data to ensure generalized learning on the data. We build the AI/ML model, optimize it, and validate it. In this process, we apply relevant optimization methods until we get a highly accurate generalized model. 

We also apply any relevant recent research which adds value to the project. Then we deploy the model on to production and keep checking the output vs. the expected output. In the process, we carry out the research, and we use that research in recalibrating the model, as and where required. We then redeploy the recalibrated model needed to ensure that the model is still valid despite the changes happening in the ecosystem. In the current world, change is the only constant.

Question 4
What skills do you need to implement an AI/ML project?

The following are the skills typically required to implement an AI/ML project effectively:
  • Determining the right business case/problem to be solved
  • Allocating the right resources – Data Scientist, Data Plumber, Data Engineer, etc.
  • Determining the data needed to train the AI/ML model effectively
  • Exploring the data to understand its completeness and correctness
  • Determining the appropriate method/algorithm to be used
  • Determining appropriate validation methods and optimization methods
  • Understanding if the model has appropriately generalized and is not overfit or underfit
  • Providing appropriate data stream to the model and eliminating any Bias before deploying the model on the production system and
  • Recalibrating the model on a need basis with any changes

Question 5
How do you know whether you have the right data and enough of it that will result
in AI/ML success?

By analyzing the data related to the problem to be solved, we can make out if the right data is collected and available. Normally, we guide the organizations as to what data is required based on the problem/issue being solved. Nowadays, most organizations have a good amount of data, as most of them have moved to digital platforms. However, the quantity of data is secondary. The quality of the data is the most important. There are various ways we can augment the available data. If the problem/issue/challenge makes business sense, and we have quality data, we can always proceed. The relevant data may also be many times possible to be sourced from third-party agencies.

Bonus Questions
Dr. Hodeghatta answered all the questions I sent to him, so I have included them under bonus questions.

Can you explain supervised learning, unsupervised learning, and reinforcement learning? When would you use one over the other, and can you use all three with the data you have?

We use "supervised learning" for predicting a category or a class given that the model is trained based on the data whose categories are already labeled. When we say labeled, somebody, either from their experience or other means, has already classified the record into specific classes. Regression, Classification are some of the areas where we use Supervised Learning. We use unsupervised learning when we do not know the categories/classes to predict.

There is no labeled data to train the model. It is not possible to label them, either because of the lack of evidence or the high cost of labeling. The model itself will identify the data class by categorizing it based on some measure (normally distance measures). Clustering is one of the methods which are part of unsupervised learning. We use reinforcement learning, particularly in those scenarios where there is a reward involved and later steps depend upon the earlier steps.

If a CEO needs to see something quick before providing more funding for an AI/Ml project, what do you suggest an approach CEO should take? 

Analyze the data and show the pattern or carry out a small dummy-proof of concept and show the possible value.

How can a CEO determine whether a data scientist is any good? What kind of
questions should he be asking to a data scientist he is talking to?

The CEO should make out the following characteristics of an effective data scientist:
  • Should be able to understand the business problem when explained – open mind with good listening skills and understanding capability
  • Should be able to suggest the data to be collected based on the discussion with relevant domain experts
  • Should be able to determine the right algorithms, optimization methods, and validation methods to be applied
  • Should have good interpersonal skills and communication skills to deal effectively with other relevant stakeholders
  • Should be able to apply relevant research to the project in hand to deliver a high level of benefit to the company

According to us, I have listed above not the conclusive and exhaustive list but some of the key requirements.

Since you hear AI/ML everywhere, how can a CEO determine whether a problem
lends to automation or machine learning?

Automation is straight forward. You know the rules of the game already. You need to program those rules, test that the rules provide the output as required, and deploy the solution or application. In the case of AI/ML, the patterns or rules are not known. Rules are the one learned by the AI/ML models and are generalized so that the model works effectively on a wide range of
applicable data values.

Can you explain how to mitigate or even eliminate bias in machine learning models and AI systems? Is the problem mainly with the data or algorithms? When you try to mitigate bias, is it possible to overcorrect so that there is a different kind of bias? 

A good data scientist always tries to eliminate the AI/ML models' bias by understanding if the model has generalized well. She/he applies various types of validation over a wide range of possible data points using appropriate evaluation methods to validate that the model has generalized well. A well-generalized model is typically not biased. The experiment is the key to the success of a scientist. A data scientist shall not hesitate to experiment to understand that there is no bias. To do that ideally, she/he should not carry any pre- notions. A detailed webinar is conducted by Dr. Hodeghatta, Umesh, on December 16th, 2020
regarding the bias in AI/ML models. 

You can register to learn more about "Bias in AI":

Three Book Recommendations on AI/ML for CEOs

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel
Small Data: The Tiny Clues That Uncover Huge Trends by Martin Lindstrom
Analytics in a Big Data World: The Essential Guide to Data Science and its Applications by Bart Baesens

Dr. Umesh Rao Hodeghatta's Contact Info:


I want to thank Dr. Hodeghatta for answering the questions about a very important subject that is changing our world. 

I am an author, speaker, executive coach. I guide people thrive on high stakes stage whether it's for a job interview, career advancement, a sales presentation or a high-stakes speech. I am the author of a practical  book on speaking titled Winning Speech Moments: How to Achieve Your Objective with Anyone, Anytime, Anywhere. The main idea of the book is that if you want people to remember your speech and take action, you must create a winning speech moment.

Please download the free speech checklist I created that I always use to create a winning speech for any occasion. 
Please contact me if you would like to discuss how you can work with me. If you are interested in inviting me to give a Zoom talk (during the pandemic) on job Interviewing. career development or high-stakes speaking, you can reach me at or 732-847-9877. 
Note, if you are an author, content producer or interesting and would like me to interview with five questions and then publish it as a blog post and promote it on LinkedIn, Twitter and Facebook, please contact me.

Sunday, 8 November 2020

Challenges of Data Privacy In AI and Machine Learning

On 11/06/2020, I discussed "Data Privacy and AI" during "Analytics and Data Privacy" conclave organized by Symboisis University. 

Had some excellent questions by audience and my answers to address the "Challenges of AI and Data Privacy".


1. Can AI and data privacy co-exist?

 ·        Yes, for most of the AI/ML analysis we anonymise the privacy data appropriately before carrying out the analysis.

·        For some of the AI/ML Analysis like recommendation engine where some privacy data is required to provide the recommendation, we ensure that only the data required is used seamlessly without any privacy data.

·        We ensure Acceptance of Privacy Policy by the client where applicable. 

·        Additionally, CISO, the Privacy Officers exercise required controls on all the activities carried out by the organization including AI projects.

·        In some cases, we make sure privacy data such as SSN, Protected Health Information (PHI), and other personal data is masked before providing data to AI/Analytics teams. For example, we mask the first 5 digits of SSN or the first 12 numbers of credit card, etc.

2.  What steps are you taking in your organization to protect customers' privacy while providing data access for AI and Machine Learning projects?

 The following are some of the steps to consider:

·        Exactly knowing the location of “privacy data” on our systems, how it is stored, processed and where this data is duplicated e.g. for backup purposes etc. 

·        Using proper encryption of privacy data when stored on the systems and also when transmitted

·        Ensuring that the privacy data is not duplicated unless it is absolutely required for business enabling purposes

·        Access is provided to privacy data only on absolute business need basis

·        Anonymise the privacy data while carrying out any analysis (as far as possible)

·        Utmost care to process the data only for customer consented purposes

·        Ensuring that the data is deleted/archived on timely basis based on the IT security policies and procedures.

·        Exactly knowing what data is considered as “privacy data” as per various statutes and regulations e.g. HIPAA, GDPR (General Data Protection Regulation)

·        “Anonymizing” data is the common practice to protect “privacy data”


3. Why is Data Privacy such a concern when it comes to AI?

 ·        First of all, organizations are concerned about adhering to the specific standards and government regulations.

·        Certain information is private, sensitive, secret, and such data are restricted.

·        Such information is classified as “confidential” and needs to be protected.

·        Compromising sensitive personal data may damage the reputation of the organizations, violation of privacy rights and loss of business.

·        If any confidential data breach happens, senior executives are answerable to investors, government regulators and to consumers.

·        For instance, sharing medical data with pharmaceutical companies and hospitals for medical research purposes, or for finding patterns of a particular drug usage, can be violating a patient’s privacy even though sensitive information is masked.

·        Hence, data privacy is a big challenge not just for AI projects but in general.


4. How much is the end user aware of what personal data is being collected and shared?

·        The answer is "it depends".  However, over a period of time, the awareness among the user community with respect to privacy of personal data has increased. 

·        Sometimes, the users simply accept the privacy policies of the organizations/software download without reading thus allowing organizations to use the personal privacy data beyond what they requested.

·        How many of us actually read the End User Licensing (EUL) Agreements?

·        Because of some of the recent incidences such as Cambridge Analytica, consumers are becoming aware as to how his/her data is being used.

·        Also, various laws are being brought in to support the privacy of personal data across the globe.  Some of the regulations include General Data Protection Act (GDPR), California Consumer Privacy Act of 2018 (CCPA)

  5. What is your view on data privacy regulation that has been recently announced?

·        The most recent regulations are General Data Protection Act (GDPR) and California Consumer Privacy Act(CCPA 2018). Both are aimed to protect customer “privacy”.

·        Some personal data are forcibly collected by organizations and some are not.

·        For example, Google Map, Google search, Facebook, are collecting all your data

·        Personal data is any information that relates to an individual who can be directly or indirectly identified. Names and email addresses are obviously personal data. Location information, ethnicity, gender, biometric data, religious beliefs, web cookies, and political opinions can also be personal data.

·        Key regulatory points of GDPR includes – providing transparency how personal data is used, how data is stored, who has access, what is the purpose of collecting such data

·        GDPR is for EU regions

·        Similarly, California Consumer Privacy Act (CCPA 2018) for California state residents.

·        In July 2019, New York passed the Stop Hacks and Improve Electronic Data Security (SHIELD) Act. This law amends New York's existing data breach notification law and creates more data security requirements for companies that collect information on New York residents.

·        The consumer has right to access, right to rectify, right to delete, right to object

·        These regulations give more control over the personal information that the organizations are collecting.


6. Do you think the evolution of AI technology in the next 5-10 years  seems to be blocked by Data Privacy?

·        Yes, very much. It is true that sometimes “privacy” of data tries to block the data usage for AI purposes. 

·       Particularly, organizations dealing with sensitive and private data, like healthcare or finance, have lagged due to regulatory constraints to protect users’ data

·        AI team may comprise of temp workers, contractors, not just employees. Organizations are concerned about the data leak by people who are working on AI Projects. Hence, organizations exercise caution providing access to data to the right people.

·        Leak of privacy data can lead to huge penalties, company’s bad reputation and the consequences can be severe.

·        Hence, AI projects normally require data access approval from the top C-Suite persons to ensure that the data is available for AI purposes which are intended to solve critical business problems or challenges. 

·        I strongly suggest that the AI Team reports to one of the C-Suite persons to ensure that there is authority, and accountability at the same time.

 7. How do Data Privacy regulations affect businesses and innovation as whole?

·        One thing you have to understand is , AI is leading to significant benefits to the organizations in terms of increasing productivity, optimization of the processes, reducing downtime, increasing product reliability, increasing proactive preventive maintenance, prescriptive analysis, and additional business generation.

·        But, “privacy” is equally important and to avoid any possible breach of data privacy, the organizations have to invest money to put proper controls in place so that their IT and systems are secured and protected from threats.

·        Though there is an initial cost, the penalty on account of breach of privacy data or on account of negligence is much larger

·        Hence, it is an advantage for the organizations to spend additional money and protect their data, by implementing proper controls than paying huge penalty later on.

·        Once the data privacy meets standards, and all the regulations and proper controls are in-place, approval for accessing data that is required for solving AI/Machine Leaning/Analytics business challenges can be provided

 8. Should we as consumers be given more choice on how our PII data is used by AI algorithm?

·        Yes, the consumers should be aware of  what is “personal identifiable information” (PII).

·        The consumer also should be aware of how their data is collected, and what is the purpose of it being collected

·        They should know whether the data is used for research purposes or monetory benefits?

·        Based on all this information, the consumer should be able to make a choice/decision whether he should give consent or not.

·        The consumer has to read the data agreement carefully to check whether their data can be sold to others for monetory benefits

·        The consumer should have rights to modify the data, erase the data or to decline - collecting such data

·        The consumer should be aware that most of the AI and Analytics analysis can be done by anonymising the privacy data appropriately 


Other Important factors to consider:

·        Cryptographic methods like Homomorphic Encryption, Garbled Circuits, Secret Sharing and Secure Processors and ultimately the generation of “Synthetic data”.

·        “Synthetic data” can be used without risking privacy of users

·        Anonymization removes some of the informational value of the data, it can distort or completely destroy important correlations.

·        Also, Intel’s SGX system, enables secured Trusted Execution Environment (TEE) which guarantees privacy through hardware.

Tuesday, 29 September 2020

Challenges of Executing AI Machine Learning Projects


Date: 10/27/2020

Challenges of Executing AI Machine Learning Projects

Umesh Rao Hodeghatta, Ph.D

 According to a recent research report by Wall Street Journal, AI project failure rates near 50%, more than 53% terminates at proof of concept level and do not make it to production. Gartner report says that nearly 80% of the analytics projects are not delivering any business value. That means for every 10 projects, only 2 projects are useful to the organization. Let us pause here a moment, rather than looking at what makes AI projects fail, let’s look at the challenges involved in AI projects and find a solution to overcome these challenges.

AI projects are different from traditional software projects. Typical software projects, consist of well-defined software requirements, high level design, coding, unit testing, system testing, and deployment along with beta testing or field testing. Now, organizations are adopting Agile process instead of traditional V or waterfall model, but all the steps mentioned are still valid.

However, AI and Machine Learning projects’ methodology is different from the above. Our experience working on many AI/ML projects has given us insights on some of the challenges of executing AI projects. Also, we are in regular touch with senior executives and thought leaders from different industries who understand the success formula. The following discussion is based on our practical experience and knowledge gained in the field.

Successful execution of AI projects depends on the following factors:

1. Clearly aligned Business Expectations

2. Clarity on Terminologies

3. Meeting Data Requirements

4. Tools and Technology

5. Right Resources

6. Understanding Output Results

7. Project Planning and the Process


Before, I go further onto details, I want to reiterate the fact that there are two sectors of organizations who are keen on AI projects.

The first category is, Tech(nology) companies like Microsoft, IBM, Google, Amazon, Intel, Cisco, etc. They have been inventing technologies and solutions and have developed 1000s of software products and applications.  For these tech companies, succeeding with AI projects is no big deal!! They know the technology very well, data is not a problem, requirements are noticeably clear, and they have all the resources and expertise who have been working on the technology and even on machine learning and AI projects for decades. AI or Machine learning is not a new term to them - “Old wine in a new bottle”.

The second category is the rest of the non-tech companies, who are exploring the benefits of initiating AI in their organizations, as every other big company CEO is talking about AI.

Whereas, the second category, non-tech companies, only adopt the technology, applications and tools developed by the first category tech companies and are curious to apply the new technology – AI, Machine Learning, Deep Learning, Neural Network, Reinforcement Learning, etc., to solve the business problems. This becomes a new initiative for the CEOs in the organization in the yearly plan and will trickle down to the bottom of the pyramid. The subsequent levels of executives have to adopt the new AI term in their projects’ but they may lack expertise and skills required to implement and succeed.

Let’s dive into each one of the challenges I mentioned in detail.

1. Clearly Aligned Expectations

Every CEO in an organization is obsessed with AI and asking his people “What percentage of business decisions are we making with help from AI?”,

Aligning Business is necessary for an AI Project. Whatever may be the technology, whether it is AI or non-AI, every CEO or Senior Executives are interested in these three factors:

1.Can AI boost customer satisfaction level

2. Is AI helping in making quick Business Decisions

3. Is AI helping in overall company performance and ROI


Most projects fail because of the mismatch of expectations. Software projects have well defined requirements, and they may not change much till the product is released.  Further, the Agile methodology provides the flexibility to handle the changes in a better way.

Whereas in AI and ML projects, understanding requirements itself can be challenging. It is important to know the problem an organization wants to solve and whether it can be solvable using AI/ML techniques.

In case of AI and ML projects, sometimes an organization defines a problem which can be solved without AI or ML. In other cases, though the organization has a well defined business problem, the project manager has aligned the resources, and kicked off the project, the data scientists report that they need “data” to solve the problem, and the team may not get access to the right data that is required to solve the problem for a long time or the organization may not have the required data. After months pass by, project is still in initial stage with no progress towards solving the problem. This can make the management unhappy and cancelling the project.

Sometimes, the technical people may define the problem which the business is not really interested in.  If the organization is not interested, even if the problem is solved, the solution may not be deployed leading to wastage of time and resources.  Also, it is possible that the technical people may define it wrongly leading to not usable solutions.

Hence, based on our experience, we feel, before you agree to start the project, it is necessary to set the expectations. Know what is the problem an organization is trying to solve, whether the problem practical, is it really a Machine Learning project or just an automation software project, do you have proper data, are you authorized to access data, etc. You must set the organization expectations properly, while explaining the data requirements or whether the problem itself is AI/ML related or not.

When an AI Project is initiated, it is important to set the expectation with the Business and the Data Science/AI team:

1. Deployment – How long it takes for deployment and how you integrate with the production systems, and some of the concerns with data/cyber security.

2. What is the value add of the new AI System

Always remember, every CEO is interested in leveraging cutting edge technology to:

a. Achieve customer satisfaction

b. Quick Business Decisions

c. Company performance and ROI

In Summary, Set proper expectations:

-          Is it really AI/ML problem?

-          Set the DATA expectations

-          Align business with the problem and requirements?


2. Clarity on Terminologies

Because of the current trend in technologies and the BUZZ it creates, too many terms like Machine Learning, Data Science, Business Analytics, Data Analytics, Data Visualization, Data modeling, Predictive analytics, Diagnostic analytics (I heard this recently…something I had never heard before), Prescriptive analytics, AI, Deep Learning, Reinforcement learning, etc, are heard by people. Are these terms same? If not, what is the difference?, which term means what? is very confusing for people, especially those who are new.  This is so because various people define each term differently and when any such term is used, people are not clear what it really means in the context.

Each industry may differ in the definition of these terms -  what is Data Science for Home Depot1 may be Machine Learning for Google2 and vice versa. Hence, based on our experience, if possible, it is not worth defining these terms and convincing the company executive about the “confusing terminology”. Instead, it is better to focus on the problem and how it can be solved, whether to apply supervised machine learning or simple descriptive analytics.

3. Data Requirements

For Business, Data is everything. Agree?

Data is Information

Data is MONEY and


For Business, DATA is sensitive

Business DATA is under surveillance, being monitored by government regulations and other regulatory bodies.

But, FOR AI /Analytics projects, without “DATA”, there is no Analytics or AI/Machine Learning projects. Data plays an important role in AI/ML projects.

However, Data is information, Data is the secret of business. Exposing data is exposing your business. Organizations are overly sensitive about their business data and moreover, CEOs, CFOs are held responsible for any breach of data, data falling into wrong hands, used for wrong purpose. If any small data breach happens, they must answer to investors, government regulators and to consumers. How much risk can they afford to take? Hence, organizations have strict process on who should access and what data should be accessed. There can be several approval processes involved before getting access to data and the whole process may simply take lot of time.   Or else, we may get only a portion of data which may not be useful unless other related portion of the data is also obtained.

Without “DATA”, there is no Analytics or AI/Machine Learning projects. Data plays an important role in AI/ML projects. Data is information, Data is the secret of business. Exposing data is exposing your business. Organizations are overly sensitive about their business data and moreover, CEOs, CFOs are held responsible for any breach of data, data falling into wrong hands, used for wrong purpose. If any small data breach happens, they must answer to investors, government regulators and to consumers. How much risk can they afford to take? Hence, organizations have strict processes on who should access data and what data should be accessed. There can be several approval processes involved before getting access to data and this whole process may simply take lot of time.   Or else, we may get only a portion of data which may not be useful unless other related portion of the data is also obtained.

AI / ML Depends on DATA. Hence, AI Project team must understand the Organizations’ challenges with respect to data and work with them closely. Understand the company process, Understand DATA Compliances, etc.

Though the team has planned to work on a well-defined AI project, without data they simply have to wait for approval process. You can imagine the weekly updates that would be on the project or daily scrum standup meeting updates.

If the AI project involves applying supervised machine learning, then can you get labeled data? Why would a company have labeled data in the first place? Who will label the data for you? Will anyone understand what is labeled data? What is the cost of labeling exercise?  Is the company willing to bear this cost?

Imagine you are working on a healthcare project, classifying X-Ray images. You have to ask hospitals/healthcare providers to give you not only X-Ray images but also labeled X-Ray images with the corresponding issues found in them. Do you think, they have labeled X-ray images as per your defined classes?

Inaccurate DATA can lead to wrong predictions and wrong decisions

Minimal Data may not be useful.

Hence, you must set the “data” expectations at the beginning of your project. Otherwise, you are bound to FAIL!!

4. Tools and Technology

Tools support implementing technology required to solve a specific problem. All the tools may not support all the technology. Different technology is applied to solve different problems. For example, using “deep learning” to solve simple “credit approval” problem, may be an over kill. Or using decision tree for image classification may not give desired results. Or there is no need of applying machine learning to clean a database.

There are open source tools and commercial tools. The decision to use the appropriate tool depends on the organization’s long term goal and objective. Many factors can play a role in selecting the tools including IT infrastructure and production environment and the compatibility of the tool with these.  Also, the cost of acquiring these tools may be a factor in case of commercial tools. What is important is choosing the tools which support the right technology and integrates well with your existing environment.

Commercial tools such as PowerBI, Tableu, are extremely good at Data Visualization and descriptive analytics. 

You don’t need expensive tools for data plumbing and cleaning jobs. Any software can be programmed to do data preprocessing required for building ML model.

5. Resources

Resource allocation varies from organization to organization. Definition of a data scientist, what are the tasks of a data scientist, should you hire a Ph.D. or not, what type of “scientist” is required to handle your data - is something people in the industry differ and argue on. However, we will not be getting into the debate of defining who is a “data scientist” or the role of a data scientist.

We believe that the data scientist should spend more time on developing AI/ML /analytical solutions and the type of model she/he should be creating. Data scientists should seek help from the software coding experts in preparing the data. Data preparation may involve tasks like handling missing values with appropriate techniques, discarding or rectifying wrong data, deleting certain columns, merging multiple columns to come up with a new feature, transformation of data, scaling of the data (in short data wrangling and data plumbing), reduction of features to meaningful short features, etc. This requires coding skills such as python, C++ or Java or any other programming language. When it comes to text, preprocessing is more involved such as tokenizing, removing punctuations, removing unnecessary junk characters, removing emojis, etc. This requires a solid knowledge of “regular expressions.” And we believe this can be achieved by anyone who has good programming skills.

As per our resource definitions, a “data scientist”, should spend more time creating different Deep Learning/Machine Learning models, in training the model, understanding the output of the model, improving the performance of the model by tuning the model parameters, in deciding which model is best suitable for the business problem and finally helping in deployment of the model. We believe this requires an advance knowledge of probability, deep understanding of statistics, different machine learning algorithms (not just the usage of scikit-learn or TensorFlow libraries), and creating various deep networks.

Similarly, Data or Busines Analysts can perform descriptive analytics using MSBI or Tableau,

According to our experience, hiring a data scientist based on “coding test” may not be wise and you may end up hiring “coders” not a “data scientist”.  Further, each project should be staffed with the right type of people with right competence to deliver best results.  Wrong type of resources even in good numbers may lead to delay of the project.

6. Output Results, Trust and Ethics

Interpreting Output results of AI/ML models proves the Trust and ethics of AI Project. Wrong interpretation of the results with non-validated/not completely and effectively validated model can lead to misguided decisions which can jeopardize the very purpose for which we are deploying the model.

What is the acceptable accuracy? What is the kind of samples selected to arrive at the model? Are these samples sufficient? Have we considered the entire set of relevant data? Or have we omitted an important aspect or feature from the data which may lead to seemingly good but misleading results.  For a model which gives general information like if this is right time to book the flight ticket, it may not matter much.  However, if it has to do with the life and safety of human beings, then even a few false positives or a few false negatives can seriously undermine the use of the model in spite of high accuracy. 

Further, different algorithms may have to be validated through different but relevant metrics.  If we use wrong evaluation metrics it can lead us in wrong directions.  Similarly, overfit models may not be relied upon as they throw up different results on unseen / new data.

We should be absolutely clear that the models have generalized the solution and are not an overfit to the training data.  Further, the evaluation metrics on the test/validation data including new data should give us the confidence that the model is deployable for production use.  Such decisions should be purely on the fulfillment of the purpose/objective of the model.

Output of AI Machine Learning models depend on several factors but mainly on the “data”. Let’s say, your data scientists have developed a face recognition AI system, with an accuracy of 99%.  Or let us say your data scientists have developed a model which can predict a disease based on the data input to it and the accuracy is 99.5%.  The organization has to make a decision whether to deploy this model or not.

7. Project Planning and Process

Traditional software project planning has a well defined process, timeline, milestones, deliverables, tasks and subtasks allocated to resources, risks and mitigation plans, and weekly tracking. All these can be defined using standard methods and a lot of prior knowledge which is already existing among the resources.

The same project planning cannot be applied for AI and ML projects because of many unknowns and uncertainties. You may wonder, how can you execute a project with so many unknowns and uncertainties?

The success of AI and Machine Learning project depends primarily on the “data”.  The first step in any project planning is defining business requirements. Since AI and ML projects rely on data, you have a couple of choices. The first choice is you explore the data provided by business and then define what problems that can be solved OR the second choice is, the organization will define the business requirements/problem and then you explore the available data and check if you can get the required data.

With the first choice, you already have data and it may take sometime to explore the data, analyse the data, understand the data and defining the problem that can be solved, however, the problem that you are defining to solve may not be relevant or appealing to the business and they may reject your proposal. The entire time you have spent is then considered as a waste.

On the other hand, if business defines the problem, you must be clear about what data you need to solve the problem. Once the problem definition and requirements are clear, then the exploration of data starts. If such data does not exist, then you can either stop the project or try to implement methods to start getting such data.  If you have the data, the step is to get the data from various sources applicable, consolidate them and clean the data as relevant. Here, the aspects like access permissions, etc. matter.  Next step is, to determine the AI/ML method and algorithm to be used, run the same against the cleaned data, analyze and validate the results.  Where the results are not up-to-the expectations, we identify the alternative methods/algorithms and/or tune the hyper parameters and carry out further analysis till we are satisfied with the results per the objective of the project.

We use various metrics to measure the performance of the model and once our validation gives the requisite comfort and our metrics suggests good generalized learning and meeting business requirements of prediction of the model, we go ahead to deploy the model on to the production. 

Figure 2: AI and ML Project Lifecycle

 In both the scenarios, it is hard to define the project plan, milestones & deliverables, risks & mitigation plans, tracking the project on Sprints. There are challenges at each stage, whether getting access to data, exploring data, labeling data for supervised machine learning, or which algorithm will give best results, or deploying the models in fields. Hence, executing and managing AI and ML project is much more challenging than other software projects.

It is a complete reversal of Software Project process.

To summarize,

Success of AI project depends on the following factors:

1. Aligning Expectations

2. Setting Data Expectations

3. Using of proper Tools and Technology

4. Aligning Right Resources

5. Asserting Output Results

6. Aligning Project Planning and the Process