What is LangChain: A Beginners Guide to developing AI-powered applications

Introduction

With the recent advancements in technology, there has been an ever-increasing rise in online platforms that are based on Artificial Intelligence. These platforms, such as OpenAI’s ChatGPT, have started a new wave in the world. Just type in a few words and you’ll get an entire database of factual information in one place. You don’t need to do a lot of research like in earlier days. This technology has been useful for students, teachers, digital marketers, content writers, and many more. No wonder that with its release in 2022, it has gained high popularity among the youth and working professionals. 

Moving back a few years, ChatGPT came into the picture because of GPT-3, which is a Large Language Model or LLM released in 2020. To sum up, based on GPT-3, ChatGPT was created. But what is the technology behind GPT-3? Well, it’s simple, it is a Large Language Model that uses deep learning methods to produce human-like content.  

However, it cannot be ignored that it was through ChatGPT that interest in Large Language Models reached the sky. After all, it opened up new doors for everyone around the world. Indeed, LLMs and generative AI weren’t that popular when they were released but after ChatGPT people started showing their interest in this technology. As a result, there have been significant advances in LLMs. 

Historically, it all started with the release of Google’s “sentient” LaMDA. After that, the first open-source LLM “BLOOM” was released. And then OpenAI released their next-generation text embedding GPT 3.5 models. Finally, with these giant leaps in the LLM world, OpenAI released ChatGPT – which directly put LLM into the spotlight.

Around the same time, the LLM wave ushered to the shore another useful framework, which came to be known by the name LangChain. LangChain was released by its creator Harrison Chase in the fall of October 2022.  

An Overview of LangChain – An Open-Source Python Library 

LangChain is an open-source Python library that helps in the creation of LLM-powered AI applications. LangChain is filled with amazing features and tools which are a boon for anyone wishing to develop software out of it.

At the core, the framework of LangChain is built around LLMs. It can be used for chatbots, summarization, Generative Question-Answering, and a lot more!  The primary concept behind the LangChain library revolves around its ability to interconnect different components, allowing the creation of sophisticated applications for Language and Learning Models. The chains consist of multiple components on several modules such as prompt templates, LLMs, agents, memory, etc. 

Apps need to be super smart when it comes to understanding language, and that’s exactly where LangChain comes into the picture. Through the help of LangChain, it has become super easy to connect AI models with different kinds of data sources so that you can get customized natural language processing (NLP) solutions. LangChain helps to develop applications that are powered by a language model, in particular, a Large Language Model.

The technology used behind LangChain is beyond standard API calls as it is a data-aware agentic framework that enables connections with various data sources for a more enhanced and personalized experience. So, whether you’re a developer, a data scientist, or someone who has always been curious about the latest developments in NLP technology, this article has been made for you. Let’s explore how you can unlock the power of this framework in your organization.  

Key Features of LangChain 

LangChain offers the following 6 modules. 

  1. Large Language Models and APIs 

LangChain’s primary component is a Large Language Model (LLM). At the time of writing this article, LangChain supports 20 LLMs. These include OpenAI, Hugging Face models, DeepInfra, Cohere, to name a few. You may take advantage of LLMs’ full potential for a variety of use cases by using the application programming interfaces (APIs) that LangChain offers to access, interact with, and promote smooth interaction with them. For instance, you could develop a chatbot that generates customized trip plans based on the user’s preferences and prior travels. 

  1. Prompts Templates 

The primary input to an LLM is prompt. It takes effort to come up with the right prompt for the desired output. Once the prompt is built you may use it as a template for other tasks. With LangChain it becomes quite easy to manage prompts with prompt templates. LangChain provides several classes and methods to construct and manage the prompts easily. 

  1. Chains 

To create complex applications, you need to combine multiple components together. LangChain provides interfaces for building chains. For example: you may use a PromptTemplate component with a memory component to create a chatbot. Another example, if you want to ask a followup questions to a bot, you may require a SimpleSequentialChain. Chains can also be used to add context to the conversation.  

  1. Indexes 

LLMs by default do not have contextual information (your business or domain specific information). You can use LangChain Indexes to load your company documents and use specific LangChain classes for a question and answering task. 

  1. Memory 

By default LLMs and chat models are stateless. They don’t remember the previous state. So if you build an application where you require to remember the previous conversation you need memory. LangChain provides interfaces for storing short term and long term memory. 

  1. Agents 

LLMs have limited knowledge. Ex: Open AI ChatGpt is trained with data till Sep 2021. What if you need current knowledge? That’s where LangChain Agents come into play. The following agents can help with our applications: Google Search, Wikipedia lookup, calculators 

Installation and Setup of LangChain 

  1. You need Python on your system to setup and build applications using LangChain. 
  1. Then you using pip, you will install LangChain. Open your terminal and type: 
pip install langchain 

This command will install LangChain and all its dependencies. 

  1. While LangChain can connect to 20+ LLMs, in this example I have used OpenAI. To install OpenAI in your system, type: 
pip install openai 
  1. Once OpenAI SDK is installed you need to get its key. You may put the key in a .env file and load in your Python code 
import os  
from dotenv import load_dotenv  
import openai  

import langchain  

from langchain.llms import OpenAI 

load_dotenv()  

openai.api_key = os.getenv("OPENAI_API_KEY")

  1. Now you can use LangChain to interact with OpenAI with this sample code 
llm = OpenAI(temperature=0.9) 
text = "What would be a good company name for a company that makes colorful socks?" 

print(llm(text))

3 Application Examples of LangChain 

With the help of LangChain, you can create complex applications by leveraging a large language model (LLM). LangChain may be utilized to construct a wide range of applications across various industries thanks to its adaptability, customization opportunities, and powerful components.

Examples of applications that make substantial use of LangChain include the following: 

  1. Text Summarization (Data Augmented Generation) 

Do you remember writing precis in school? It was indeed a tricky task. To summarize a long text into a tiny paragraph is a tiresome and time-taking process because one has to read the entire text for hours just to find a gist. But we don’t have to worry about that anymore, because with LangChain we can easily summarize lengthy texts in a short amount of time.

You can create tools using LangChain that effectively manages text summarizing chores. Your application will be able to produce precise and succinct summaries of lengthy texts by utilizing strong language models like ChatGPT, enabling your customers to rapidly understand the core ideas of complicated publications. 

  1. Question Answering 

Building applications with question-answering features is another application for LangChain. By integrating with a large language model, your application can receive user-inputted text data and extract relevant answers from a variety of sources, such as Wikipedia, Notion, or Apify Actors.  Users looking for fast and dependable information on various subjects can find great value in this feature. 

  1. Chatbots (Language Model) 

Chatbots came into fashion a long time ago, but ever since ChatGPT, they have gained quite a popularity. As a result, the need to build such useful tools has reason among coders as well. Those days are long gone where one had to write huge codes on their own to build applications. With the advent of LangChain, this task has become super easy.

Your chatbot applications may offer a more organic and interesting user experience by utilizing LangChain’s framework and components. The language model will produce context-aware responses when users interact with your chatbots for general discussion, support questions, or other particular objectives. 

These application examples are just a few instances of how LangChain may be used to create robust and adaptable apps. Understanding the advantages of various language models can help you develop creative solutions that are tailored to the requirements of your users. 

Conclusion 

In conclusion, the framework and modules provided by LangChain streamline the development procedure, enabling programmers to take advantage of language models’ full potential and produce sophisticated, data-aware applications.  

LangChain’s modular design and thorough documentation increase its adaptability and customizability possibilities. Applications like text summarization, chatbots, and question-and-answer systems can all be created using LangChain, providing quick and precise language processing solutions.

In general, LangChain enables people, developers, and organizations to unleash the power of language, promoting cross-cultural communication, teamwork, and creativity in the digital age. LangChain has indeed taken over the internet ever since it came into being, because of its robust features and easy-to-use technology.  

Root cause analysis in Power BI

Root cause analysis in Power BI

Microsoft Power BI has some great AI visuals which can provide an in-depth analysis of your data. In our last post we talked about an AI visual – Key Influencer visual. This visual helps in identifying factors that can impact an outcome. In that post, we analyzed what factors influence employee attrition. We also deep-dived into segments and clusters contributing to employee attrition with graphs and charts.

In this post, we will analyze and play with another AI visual – Decomposition tree.

Decomposition tree

The decomposition tree breakdowns a numerical measure into parts and analyzes what factors cause the measure to be high/low.

From Microsoft documentation:

The decomposition tree visual in Power BI lets you visualize data across multiple dimensions. It automatically aggregates data and enables drilling down into your dimensions in any order. It is also an artificial intelligence (AI) visualization, so you can ask it to find the next dimension to drill down into based on certain criteria. This makes it a valuable tool for ad hoc exploration and conducting root cause analysis.

Microsoft

Let’s take a well known example of employee attrition and understand why attrition is high. From the decomposition tree visual we plan to get answers to the following question:

What causes employee attrition to be high?

At the end of this post you will have an idea of how to use this visual for exploratory and visual analysis, decomposition of values by factors, and how you can use AI splits to dynamically split and understand the next factor for drill down.

Our final output could look like:

Getting Started

We install the latest version of Power BI Desktop, and click on the decomposition tree visual.

Power BI AI visual – Decomposition tree

You see two input fields “Analyze” and “Explain by”. In the Analyze field we put “Attrition %” and in Explain by we put several other fields say “Overtime”, “Department”, “BusinessTravel”, “MaritalStatus”, “Gender” etc. How to choose these fields in the first place? That’s a tricky question and we will answer this later.

Our decomposition tree when we drag Attrition % looks like:

Decomposition Tree with Attrition % metric

Attrition % overall is 16.12%. Our next step once we have added our metric is to understand:

  1. Which of the factors cause attrition % to be high?
  2. Which of the factors cause attrition % to be low?

Remember we have dragged several fields in “Explain by” section? Let’s click on the “+” sign next to the Attrition % bar.

You see the fields you have dragged. In addition, you see two more fields – High value and Low value.

Exploratory and Ad-hoc analysis

We begin with exploratory analysis by analyzing Attrition % by OverTime. Attrition % is 30.53% if OverTime is Yes. This means when OverTime is high attrition will be high.

OverTime

Let’s expand this level and understand when OverTime is Yes then what’s the next factor which contributes to attrition%? Let’s explore Marital Status.

Martial Status

Attrition is high among unmarried individuals and these are the ones who over time. Let’s try adding another level to this analysis, say Department.

Department

Unmarried individuals in the Sales department who over time contribute to 65.31% attrition! We can also verify this number by adding tooltips.

Out of 49 employees in Dept Sales with Marital Status Single and OverTime Yes, 32 of them left the company.

What if we start our analysis, not with OverTime? Let’s pick monthly income as the starting factor

Monthly Income

The visual flow is quite different here! Attrition is highest when monthly income is low and in the Sales department when OverTime is High.

With the decomposition tree, you can perform root cause and exploratory analysis by playing with the multiple factors and dimensions. You not only get a deep understanding of what’s happening in your data set, but you can also visually understand the data in a tree format.

AI Analysis

We started analyzing the factors based on our domain knowledge and understanding of the dataset. What was our rationale for choosing OverTime as the starting point of our tree?

The decomposition tree comes with another option to split the tree using AI algorithms. Remember we had two more options in our tree “High value” and “Low value”? It’s time to utilize them.

Let’s start with a blank slate and this time instead of selecting OverTime, let’s select “High value”.

AI Split

As we keep selecting High value at each level of the tree, the algorithm identifies the next level on its own. In the example above the levels chosen were Monthly Income followed by OverTime, Education Field and JobSatisfication. Attrition % is high when monthly income is between 0-2800, and so on.

In AI splits you see a bulb icon next to the level name. Once you hover on the bulb icon you get to see why this level was chosen.

On hovering the bulb icon

You can also select “Low value”. Once you select the low value you will observe that the factor and analysis changes.

A low value split

How to choose fields in “Explain by”?

Should we choose AI split or manual split?

How do we choose the fields in “Explain by”?

The best way to start analyzing the tree is using manual split based on the domain context and your understanding of the data. After 2-3 levels of manual split, you can then split the tree further using AI splits and understand the factors responsible for making a metric high or low.

There’s also a smart alternative to this. You can use “Key Influencer Visual” to understand what factors lead to Attrition = Yes. The visual will provide top factors impacting an outcome (attrition = yes), and you can put those factors in “Explain by” section of the Decomposition tree. When you run key influencer analysis on the employee attrition data set you will get the results as explained and shown in the previous blog post.

Power BI Key Influencer AI visual

You can put Age, OverTime, JobLevel, MonthlyIncome, YearsInCompany and others in the Explain by section of the decomposition tree visual and start drilling down the data.

Conclusion

The decomposition tree is a smart visual to breakdown a numerical measure into components. This AI visual aids in root cause and deeper analysis as shown above. You can perform ad-hoc analysis for the problem in question, understand the breakdown of values using manual and AI splits, and combine it with other Power BI AI visuals to strengthen your analysis.

One last note: to get the best of the output and results from this visual, you may want to convert numerical attributes like age, income, etc into categorical values (or bins – Example above: monthly income is broken down into 0-2800, 2800-5000 etc. bins).

PS: AI splits in the decomposition tree comes with two analysis mode: absolute and relative. We will cover this in detail in next blog post.

Next steps

If you are looking to explore the possibility of applying AI in your dataset or looking to evaluate the use of Power BI in your organization, don’t hesitate to contact us today.

Company Size(required)

Which BI tool do you currently use in your Org?(required)

Key driver analysis – What influences attrition?

Key driver/influencer analysis using the newly released Power BI “Key influencers” visual.

Key driver analysis or key influencer analysis is critical to understand what factors impact an outcome and/or what is the relative importance of a factor. Example:

What influences employee attrition? Overtime? Job Level?

What influences employee attrition in the Sales Executive role? Distance from home?

What influences customer attrition? High call rate? International Voice plan?

Knowing answers to above helps in decision making.

If employees in Job Role “Healthcare Representatives” leave the most because of the distance from home, maybe offer them fuel reimbursement or maybe offer them accommodation expenses if they stay near to office?


The newly released Power BI “Key influencers” visual (released as part of Feb 2019 Power BI Desktop release) aids such analysis very very quickly with no code! Crazy!

We applied this new visual to analyze what drives employee attrition, and I must say, I’m blown away by the outcome, ease of use, and comprehensiveness of the visual.

Download Power BI report and play with the visual.

But, how does the result look like?

From the visuals above we can clearly see what influences our variable Attrition=Yes. OverTime, MaritalStatus, YearsAtCompany, JobSatisfaction, and so on.

Not only that, the visual also provides the values of the factors which influences our variable of interest the most.

How to interpret the visual?

The likelihood of attrition increases by 2.93x if employees are doing overtime. Or, Attrition is 2.93x more likely in the employees who are doing overtime.

Hmm… if you do overtime, you may quit. This is obvious.

The attrition is 2.18x more likely if employees are single!

Attrition is also high if the Department is Sales.

And so on.

The left-hand side view of Key influencers shows all the factors influencing our “Attrition=Yes” by a factor of 1.0 and above.

The right-hand side view shows the distribution of data with respect to the selected factor and Attrition either as a column chart or a scatter plot.

The dashed line shows Avg. Attrition % of all values except for the key influencer one (in this case except for OverTime = Yes)

There is another view of this visual where we can see Top segments with high attrition % and their characteristics.

Top segments view in Key influencers visual in Power BI

The visual identified 4 segments with high attrition % along with population count. Clicking on a bubble shows us the characteristics of that segment.

Top segments deep dive

Segment 1 with Attrition % as 57.6 has employees in Department Sales, DistanceFromHome > 11, JobLevel is high and OverTime is Yes.

Wow!

You can further drill down this segment by clicking on “Learn more about this segment” and see what other factors influence this segment.

Quick FAQs on Key influencer visuals and its outcome

Can I filter this visual?

Yes, you can. Example: Why are employees in job role “Healthcare Representative” leaving the company?

Filter the visual and the analysis changes!

Is the visual interactive?

Yes, you can select individual influencing factors and see the distribution of Attrition % by the selected factor.

Can I hover over the values in the scatter plot above?

Yes, you can!

Can I see the logic or p-values associated with factors or key influencers?

No, not yet. This visual is in preview mode. Power BI team may add this feature in the future. Not sure about this.

Can I just see the top X key influencers?

No, not yet. This visual is in preview mode. Power BI team may add this feature in the future. Not sure about this.

I do not see my key influencer in this visual?

Yes, this can happen. Based on my R code using RandomForest, Age should also be an influencing factor for attrition but doesn’t show up in Power BI visual.

See this scatter plot. If Age decreases, Attrition % increases. Maybe Power BI just checks how the “increase” direction of a factor increases Attrition % or Maybe the number of data points for lower age and high attrition is less. 

As Age decreases, Attrition % increases

Can I export the data for segments?

No, not yet. This visual is in preview mode. Power BI team may add this feature in the future. Not sure about this.

Does this visual analyze multiple factors and provide conclusions?

I do not think so. In the example below, the likelihood of Attrition % increases by 11.58x if monthly income goes up. But why is that so? Could it be because for those employees the YearsAtCompany is also more?

Maybe Power BI visual needs to remove outliers.

Power BI visual, currently, doesn’t analyze this for us.

Why are employees leaving if we increase their monthly income?

I want to set this up for my data?

Ok, here are steps to achieve this.

Step 1: Download and Install Power BI Desktop Feb 2019 from here.

Step 2: Enable this visual from “Preview features”.

Step 3: Restart Power BI Desktop. Click on the visual highlighted to put it on the canvas.

Step 4: In the visual data options, drag the field to analyze in “Analyze”, and possible influencers in “Explain by”.

Note: The visual is evaluated on the table level of the field being analyzed. In this case, we are analyzing Attrition, and hence the visual runs at an Employee level. So you may not need aggregations on “Explain by” field. Otherwise, appropriate aggregations are required.

Step 5: In the visual select “Yes” in Attrition value. In your case, select the value you want to analyze.

Step 6: Share analysis with your boss/team/company, and say Thank you to us 🙂

No code, drag and drop solution to key influencers analysis in Power BI!

Simple, huh?

Thanks

Ranbeer

PS: This visual is currently not supported in Power BI Embedded, Publish to Web and Power BI Mobile scenarios.

Download Power BI report and play with the visual.

AI + Power BI = Wow BI

I’m glad to inform my readers that Microsoft is adding new AI capabilities inside Power BI. And, these are no-code solutions.

Let’s check these 4 new exciting capabilities in detail and in the order of quick wins as per me.

In this post, I will explain uses cases with examples from multiple industries for each of the new capabilities coming in Power BI. This will be followed by a general approach to solve such problems, and then the new AI + Power BI approach to solve such problems.

Capability 1: Key Driver Analysis – or Key Influencer Analysis

Note: Per Microsoft this would be available to all Power BI users.

Suppose you have a dataset of employee attrition which includes details of the employees who are in the company, who left the company along with age, gender, salary, job role, satisfaction, education, years with current manager etc.

Your task is to find factors influencing attrition. Why are employees leaving the company? What segments of employees are leaving?

A general approach for answering such questions would be to use R or Python, fit a model (say using Random Forest algorithm) or use techniques like RFE (Recursive Feature Elimination) to find out top factors affecting our label – Attrition. More details on this general approach and how we did this using R and Power BI is mentioned in detail in our case study here.

With new AI capabilities in Power BI, this would be just a click away. The outcome of the analysis from Power BI would be shown as a kind of “lollipop” chart as shown below.

Image source: Microsoft

Example: When Parental encouragement is true, the probability of a student to plan to attend college increases by 1.8x,

Or, when the employee has spent more than 2 yrs with current manager and his job satisfaction is low then attrition increases by 2.3%

From the screenshot it is not clear how multiple driver analysis can be performed: Ex: When parental encouragement is true and Gender is male – what happens then? 

A contingency matrix would have helped in this case.

Capability 2: Azure Cognitive Capabilities – Sentiment Analysis, Image tagging, object detection in Power BI

Note: Per Microsoft this would be a Power BI Premium capability

You started a campaign on Twitter and would like to analyze your users sentiments – positive/negative.

For a call center company you would like to analyze chat script and identify key items customers are talking about right within your BI reports.

Or, an E-Commerce company would like to detect objects in the images attached with customer reviews, and identify which product/brand is causing negative sentiments or causing pulling “Andon Cord”.

A general approach would be to use Azure Cognitive APIs inside your Power BI report using Power Query (more about this later) using calls such as: Web.Contents(AzureAPICallWithParams).

Another general approach would be to develop and use custom Deep Learning models. A Twitter sentiment analysis (racial vs non-racial tweet) model was developed by us and is hosted in our GitHub repo.

With new AI capabilities in Power BI, this could be just a matter of invoking a function from Power BI ribbon. We do not know yet how this will be invoked by users. But, definitely this will make our BI reports comprehensive and improve decision making.

A snippet of such comprehensive report is attached below.

Image source: Microsoft

When this comes out in preview we will have to see if Microsoft has provided ability to not fire API calls for items already tagged/analyzed – otherwise you will have to pay for every API calls (even for repeats).

Capability 3: Automated Machine Learning models

Note: Per Microsoft this would be a Power BI Premium capability

Imagine in your Power BI report along side Sales Oppty data I provide you a confidence score or probability score against each Oppty data. The Oppty owner can look at this number and decide which Oppty are more likely to be won so he/she can then focus his/her efforts on top highly likely Oppty.

A general approach to add this would be a data scientist developing such models and a developer integrating it inside the Power BI report, and a business analyst consuming the report.

With new AI capability, Microsoft is targeting business analyst so they can build, train, and apply the models right within Power BI service without writing a single line of code. Isn’t that waow?

From the initial screenshots by Microsoft, it looks like this will be part of DataFlows (another new capability, which I will talk about in later posts)

Image source: Microsoft

When this feature is out in preview, we will have to see how easy will it be to do feature engineering – feature selection, normalization, pruning, binning etc. But, this is for sure going to ease out the effort in long term.

Capability 4: Use your existing Azure ML Models in Power BI

Note: Per Microsoft this would be available to all Power BI users.

This capability is more of easing out collaboration between a data scientist and a business analyst.

Typically a data scientist builds models in Azure ML platform and publishes the model as API endpoint. A data analyst or engineer uses that model endpoint to predict outcomes and populate the data inside the BI report. This BI report is then consumed by a business analyst.

In the new AI approach the models developed by data scientist would easily be searching in Power BI, and a new interface would be provided in Power BI to hook to that model and use it in reports.

There are no screenshots for this capability by Microsoft. 

—–

The public preview of these capabilities will be launched towards the end of Nov 2018.

We would evaluate these capabilities and posts about it when they arrive.

What thoughts you have on these capabilities? How are you going to use these capabilities?

Let us know.

Thank you,

Ranbeer Makin

References:

Power BI AI Capability Announcement

Power BI AI Capability Preview Signup Form