The biggest challenge in applying Machine Learning is….

Machine Learning and AI applications are everywhere. Recently Andrew Ng launched drive.ai – self-driving cars. He aptly said: “The future is here”. But I must say this future arrived pretty quickly!

After talking to our customers and over the period of time we found that there are many challenges before we apply machine learning or AI algorithms. Many of our customers want to see results quickly. We take them through methodical steps to avoid surprises.

Here are few challenges to applying machine learning to solve business problems.

1. Do we know what problems to solve? This is the first question you should ask, and the biggest one. My customers want to train deep learning model on the cloud but when my team asks them deeper questions about what they want to solve, they do not have answers. Or even if they do have answers they are not very clear.

This is where our team’s expertise comes in. We ask questions to our customers to help them understand what problems they want to solve. Questions range from asking about business objectives, what is the current problem, what results are they expecting, what is their vision etc.

2. What data do we have? Machine learning or AI algorithms rely on data. In order to predict future, you need to know past behaviors. In order to know past behavior, you need historic data. In most cases, you have data available. But the question is: is the data relevant? Is it cleaned? If you want to predict a customer’s next purchase you need to have customers historic transactions, demographics details. Another question is, is having data enough?

3. Do you have labeled data? For you to apply ML classification techniques you need to have labeled data. For example, we were working with one of our customers to automatically generate marketing headlines using Deep Learning models, in this problem we need a lot of marketing articles with “good” and “bad” headlines so ML engine knows what is good and bad.

Similarly, in a classic problem of tweet sentiment analysis, you need to label a tweet as positive or negative before ML engine can predict a new tweet’s sentiment.
Who will tag the headlines as “good” or “bad”? Who will put tweet’s sentiment as positive or negative?

4. Do you have trained people? This was the biggest challenge before but now no more. There are many online courses available where one can learn basics and advanced materials. One has to push their limits and learn new materials. I train my team through these courses that are available. Some of them are free while some of them cost as little as $10.

The challenges remain the same customer to customer. They just take different shapes and sizes. We take our customers through methodical steps in solving their business problems. We make hypotheses, test them iteratively, present findings and outcomes, and proceed to next milestone.

It’s always good to take baby steps in Machine Learning and AI Problems.

Please get in touch with us for your data analytics and data science needs.

 

5 step approach to Data Analytics problems

Data Profiling – or simply data discovery. Based on your business objectives and what information you would like to extract we understand your data with a variety of statistical measurements. This could be from simpler methods like mean, median to more complex methods like variance or standard deviation or quartile analysis

Outcome: a sense of data ranges and fields to build next steps in analysis

Find systematic patterns using Regression – we apply regression techniques to identify significant relationships and strength of impact of independent variables to dependent variables.

Outcome: key variables impacting our outcome, and their strength which can help us prioritize in case of conflicts. Ex: living in San Francisco vs having a family is more important factor for buying a car.

Correlation analysis – Looking to see if there are unique relationships between variables that are not immediately obvious. Ex: Is credit score and monthly income correlated? If yes then how does this impact my outcome?

Outcome: set of parameters/variables highly correlated which will impact our business decisions.

Outlier analysis – are the outliers showing us new emerging trends or are they just outliers? We need to check this in our data to capture trends early.

Outcome: set of outliers with their future impact on outcome. What’s the onset of those outlier?

Cohort/cluster analysis – data segmentation. Then combine the outcome of previous steps with coming up with cohorts or segmentation to target with business objectives. This is where we discover consumer preferences, segment our data, and analyze micro data for improving our decision making

Outcome: set of variables and their values which result in targeted cohorts

All in all combined you can see a full picture of your data with results and outcomes expressed as visuals for better understanding of your data.