Kdnuggets RSS Feed

Evidence Counterfactuals for explaining predictive models on Big Data


Big Data generated by people — such as, social media posts, mobile phone GPS locations, and browsing history — provide enormous prediction value for AI systems. However, explaining how these models predict with the data remains challenging. This interesting explanation approach considers how a model would behave if it didn’t have the original set of data to work with.

By Yanou Ramon, Applied Data Mining Research Group, U. of Antwerp.

Predictive models on Big Data: Mining a pool of evidence

Why did the model predict you’d be interested in this post, based on the hundreds of KDNuggets posts you read? Because you read the post about “explainable AI” and the post about “cracking open the black box“: if you had not read these posts, you would not have been predicted to be interested.

The above example is an imaginary “Evidence Counterfactual” for a model that would predict interest in this post, based on your browsing data on KDNuggets (much like targeted online advertising works these days). In this post, you’ll learn more about the Evidence Counterfactual, an explanation approach to explaining the decisions of any predictive system that uses Big Data.

More companies are tapping into a rich pool of humanly-generated data (also referred to as “behavioral big data”). Think of a person liking Instagram posts, visiting different locations captured by their mobile GPS, browsing web pages, searching Google, making online payments, connecting to other people on LinkedIn, writing reviews on Reddit or Goodreads, and so on. Mining these massive behavioral traces leads to artificial intelligent (AI) systems with very high predictive performance in a variety of application areas,1 ranging from finance to risk to marketing.

The goal of these AI systems is to predict a variable of interest from these data, such as creditworthiness, fraudulent behavior, personality traits, or product interest. The input data are characterized by a large number of small pieces of evidence that the model uses to predict the output variable. Let’s refer to this as the “evidence pool.” These pieces of evidence are either “present” for an instance (e.g., a person in the data set) or “missing,” and each instance only has a relatively small portion of evidence present. As Foster Provost explains in this talk,2 a predictive model can be thought of as an evidence-combining system, where all pieces of evidence that are present for an instance can be used by the model to make predictions.

To illustrate more clearly how to see behavioral big data as a “pool of evidence,” think of a model that uses location data of persons in New York City to predict someone as a tourist or NY citizen. Out of all possible places to go to (the “evidence pool”), a person would only visit a small number of places each month (the “evidence of that person”). In a numerical data representation, each place is represented by a binary feature (see the columns in Figure 1), and the places someone visited will get a corresponding nonzero value for that person. All places that are not visited by that person are “missing” pieces of evidence and get a corresponding zero value. In Figure 1, for example, Anna visited 85 places out of the 50,000 possible places used by the predictive model. For example, she visited Time Square and Dumbo, however, she did not visit Columbia University, making this piece of evidence missing.

Yanou Fig1 Locationdata

Intuition behind the Evidence Counterfactual

It is not straightforward to interpret how predictive systems trained from behavioral footprint data make their decisions, either because of the modelling technique (it can be highly nonlinear such as Deep Learning models) or the data (very high-dimensional and sparse), or both.

To understand the reasons behind individual model predictions, Evidence Counterfactuals (or simply “counterfactuals”) have been proposed. This explanation approach (to the best of our knowledge first proposed for predictive modeling in this paper3 to explain document classifications) is mainly inspired by causal reasoning, where the goal is to identify a causal relationship between two events: an event A causes another event B if we observe a difference in B’s value after changing A while keeping everything else constant.4

The Evidence Counterfactual shows a subset of evidence of the instance (event A) that causally drives the model’s decision (event B). For any subset of evidence of an instance, we can imagine two worlds, identical in every way up until the point where the evidence set is present in one world, but not in the other. The first world is the “factual” world, whereas the unobserved world is the “counterfactual” world. The counterfactual outcome of the model is defined as the hypothetical value of the output under an event that did not happen (e.g., a set of pieces of evidence is no longer present for that instance). The counterfactual explanation can be defined as an irreducible set of evidence pieces such that, if it were no longer present for that instance, the model’s decision would be different. (We can also talk about “removing evidence” when making pieces of evidence “missing”). The irreducibility indicates that removing a subset of the features that are part of the counterfactual explanation does not affect the model’s decision.

To clarify this definition, consider the following Evidence Counterfactual as an explanation for why Anna was predicted as a tourist in our running location data example:

IF Anna did not visit Time Square and Dumbo, THEN the model’s prediction changes from tourist to NY citizen.

The pieces of evidence {Time Square, Dumbo} are a subset of the evidence of Anna (all the places she visited). Just removing Time Square or Dumbo from her visited locations would not be sufficient to change the predicted class (this refers to the irreducibility of the Evidence Counterfactual). The “factual world” is the one that’s observed and includes all the places Anna visited. The “counterfactual world” that results in a predicted class change is identical to the factual world in every way up until the two locations Time Square and Dumbo.

An important advantage of counterfactuals that immediately catches the eye in the above example is that they do not require all features that are used in the model (the “evidence pool”) or all the evidence of the instance (e.g., all places Anna visited) to be part of the explanation. This is especially interesting in the context of humanly-generated big data. How useful would an explanation be that shows the marginal contribution of each visited location to the prediction of being a tourist? Such an explanation encompasses hundreds of locations. The evidence counterfactual bypasses this issue by only showing those pieces of evidence that have led to the decision (they are causal with respect to the decision) and evidence that’s relevant for that particular person (only locations visited by that person or more general, evidence of that particular instance, can be part of the explanation).

To illustrate how counterfactual explanations can be used to explain models on big data, consider the well-known 20 Newsgroups data5 where we want to predict whether a document is about a “Medical” topic. Figure 2a shows all the words being used in the predictive model and the evidence (i.e., words) of each document. The counterfactual explanation that explains why document 01’s predicted topic is Medical is shown in Figure 2b. There are 17 words that need to be removed from the document so that the predicted topic would no longer be “Medical,” meaning there is quite some evidence that explains the model’s decision.

Yanou Fig2b Medicaltopic

Yanou Fig2b Medicaltopic

Consider another model trained on the 20 Newsgroups data to predict documents with the topic “Atheism,” where we do not remove header data as a textual preprocessing step. Figure 3a/b shows how the Evidence Counterfactual can help to identify problems with the trained model. Even though document 01 was correctly classified, the header information is being used to differentiate documents with the topic “Atheism” from documents with other topics. This leads to predictions being made for arbitrary reasons that have no clear connection with the predicted topic (e.g., “psilink,” “p00261”). It is unlikely that this arbitrary information is useful when predicting topics of new documents. This example illustrates how Evidence Counterfactuals can be used for identifying issues with a predictive system (such as predictions being “right for the wrong reasons”) and how such explanations can be a starting point for improving the model and the data preprocessing.

Yanou Fig3a Atheism topic

Yanou Fig3b Atheism topic

For more illustrations of counterfactuals for explaining models on behavioral big data, visit this GitHub repository. There are tutorials on explanations for gender prediction using movie viewing data using a Logistic Regression and a Multilayer Perceptron model, and Topic prediction from news documents using a Support Vector Machine with a linear kernel function.

Computing counterfactuals for binary classifiers

The huge dimensionality of the behavioral data makes it infeasible to compute counterfactual explanations using a complete search algorithm (this search strategy would check all subsets of evidence of an instance up until an explanation is found).

Alternatively, a heuristic search algorithm can be used to efficiently find counterfactuals. In the original paper, a best-first search has based on the scoring function of the model (the open-source Python code is available on GitHub). This scoring function is used to first consider subsets of evidence (features) that, when removed (set feature value to zero), reduce the predicted score the most in the direction of the opposite predicted class. These are the best-first feature combinations. There are at least two weaknesses of this strategy: 1) for some nonlinear models, removing one feature does not result in a predicted score change, which results in the search algorithm picking a random feature in the first iteration. This can result in counterfactuals that have too many features in the explanation set or a search time that becomes exponentially large because of the growing number of search iterations. 2) Second, the search time is very sensitive to the size of the counterfactual explanation: the more evidence that needs to be removed, the longer it takes the algorithm to find the explanation.

As an alternative to the best-first search, we proposed in this paper6 a search strategy that chooses features to consider in the explanation according to their overall importance for the predicted score. The importance weights can be computed by an additive feature attribution technique, such as the popular explanation technique LIME. The idea is that the more accurate the importance rankings are, the more likely it is to find a counterfactual explanation starting from removing the top-ranked feature up until a counterfactual explanation is found. The hybrid algorithm LIME-Counterfactual (LIME-C) seems to be a favorable alternative to the best-first search, because of its overall good effectiveness (high percentage of small-sized counterfactuals found) and efficiency. Another interesting upshot of this paper is that it solves an important issue related to importance-ranking methods (like LIME) for high-dimensional data, namely, how many features to show to the user? For counterfactuals, the answer is the number of features that results in a predicted class change.

Other data and models

Evidence Counterfactuals can address various data types, from tabular data to textual data to image data. The focal issue is to define what it means for evidence to be “present” or “missing.” To compute counterfactuals, we thus need to define the notion of “removing evidence” or setting evidence to “missing.”

In this post, we focused on behavioral big data. For these data, which is very sparse (a lot of zero values in the data matrix), it makes sense to represent evidence that’s present to those features (e.g., word or behavior) having a corresponding nonzero value. The absence of a piece of evidence is represented by a zero value for that feature.

For image data, the Evidence Counterfactual shows which parts of the image need to be “removed” to change the predicted class. Removing parts of the image can correspond to setting the pixels to black or blurring that part.7 For tabular data (think of data that can be shown in a standard Excel file), that has both numerical and categorical variables, the “missingness” of features can correspond to replacing the feature value to the mean or mode, respectively for numerical and categorical features.8

Key takeaways

  • Predictive systems that are trained from humanly-generated Big Data have high predictive performance, however, explaining them becomes challenging because of the modeling technique (e.g., Deep Learning), the dimensionality of the data, or both.
  • Explaining data-driven decisions is important for a variety of reasons (increase trust and acceptance, improve models, inspect misclassifications, aid in model use, gain insights, etc.), and for many different stakeholders (data scientists, managers, decision subjects, etc.).
  • The Evidence Counterfactual is an explanation approach that can be applied across many relevant applications and highlights a key subset of evidence of an instance that led to a particular model decision. It shows a set of evidence such that, when removing this evidence, the model’s decision would be different.

GitHub resource


  1. Junqué de Fortuny, E., Martens, D., Provost, F., Predictive Modeling with Big Data: Is Bigger Really Better?, Big Data, 1(4), pp215-226, 2013
  2. Provost, F., Understanding decisions driven by big data: from analytics management to privacy-friendly cloaking devices, Keynote Lecture, Strate Europe, (2014)
  3. Martens, D., Provost, F., Explaining data-driven document classifications, MIS Quarterly, 38(1), pp73-99 (2014)
  5. 20 Newsgroups data set:
  6. Ramon, Y., Martens, D., Provost, F., Evgeniou, T., Counterfactual Explanation Algorithms for Behavioral and Textual Data, arXiv:1912.01819 (2019). Available online
  7. Vermeire, T., Martens, D., Explainable Image Classification with Evidence Counterfactual, arXiv:2004.07511. Available online
  8. Fernandez, C., Provost, F., Han, X., Explaining data-driven decisions made by AI systems: the counterfactual approach, arXiv:2001.07417 (2019). Available online

Bios: Yanou Ramon graduated in 2018 as a business engineer from the University of Antwerp (Faculty of Business and Economics). She now works as a PhD student at the University of Antwerp under Professor David Martens (Applied Data Mining group). The topic of her dissertation is on making it easier for humans to understand and interact with predictive models on Big Data by using (post-hoc) techniques to explain model decisions, both on the instance and global level.

David Martens is a Professor at the University of Antwerp, where he heads the Applied Data Mining group. His work focuses on the development and application of data mining techniques for very high-dimensional (behavior) data and the use thereof in business domains such as risk, marketing, and finance. A key topic in his research relates to the ethical aspects of data science and the explainability of prediction models.


Kdnuggets RSS Feed

Top Stories, May 11-17: Start Your Machine Learning Career in Quarantine; AI and Machine Learning for Healthcare


Also: Satellite Image Analysis with for Disaster Recovery; Machine Learning in Power BI using PyCaret; Deep Learning: The Free eBook; 24 Best (and Free) Books To Understand Machine Learning

Most Popular Last Week

  1. newLearnStart Your Machine Learning Career in Quarantine, by Ahmad Anis
  2. newDeep Learning: The Free eBook, by Matthew Mayo
  3. decrease24 Best (and Free) Books To Understand Machine Learning
  4. newI Designed My Own Machine Learning and AI Degree
  5. decreaseHow to select rows and columns in Pandas using [ ], .loc, iloc, .at and .iat
  6. newFive Cool Python Libraries for Data Science
  7. newThe Elements of Statistical Learning: The Free eBook

Most Shared Last Week

  1. AI and Machine Learning for Healthcare, by Tirthajyoti Sarkar – May 14, 2020.
  2. Start Your Machine Learning Career in Quarantine, by Ahmad Anis – May 11, 2020.
  3. Satellite Image Analysis with for Disaster Recovery – May 14, 2020.
  4. Machine Learning in Power BI using PyCaret – May 12, 2020.
  5. The Elements of Statistical Learning: The Free eBook – May 11, 2020.
  6. What You Need to Know About Deep Reinforcement Learning – May 12, 2020.
  7. AI Channels to Follow – May 15, 2020.

Most Popular Past 30 Days

  1. newFive Cool Python Libraries for Data Science
  2. newThe Super Duper NLP Repo: 100 Ready-to-Run Colab Notebooks
  3. newNatural Language Processing Recipes: Best Practices and Examples
  4. decrease24 Best (and Free) Books To Understand Machine Learning
  5. newFree High-Quality Machine Learning & Data Science Books & Courses: Quarantine Edition
  6. decreaseMathematics for Machine Learning: The Free eBook
  7. decreaseHow to select rows and columns in Pandas using [ ], .loc, iloc, .at and .iat

Most Shared Past 30 Days

  1. Free High-Quality Machine Learning & Data Science Books & Courses: Quarantine Edition – Apr 22, 2020.
  2. Beginners Learning Path for Machine Learning – May 05, 2020.
  3. Should Data Scientists Model COVID19 and other Biological Events – Apr 22, 2020.
  4. The Super Duper NLP Repo: 100 Ready-to-Run Colab Notebooks – Apr 24, 2020.
  5. Natural Language Processing Recipes: Best Practices and Examples – May 01, 2020.
  6. AI and Machine Learning for Healthcare – May 14, 2020.
  7. Deep Learning: The Free eBook – May 04, 2020.
Kdnuggets RSS Feed

Easy Text-to-Speech with Python


Tags: , ,

Python comes with a lot of handy and easily accessible libraries and we’re going to look at how we can deliver text-to-speech with Python in this article.

By Dhilip Subramanian, Data Scientist and AI Enthusiast



Text-to-speech (TTS) technology reads aloud digital text. It can take words on computers, smartphones, tablets and convert them into audio. Also, all kinds of text files can be read aloud, including Word, pages document, online web pages can be read aloud. TTS can help kids who struggle with reading. Many tools and apps are available to convert text into speech.

Python comes with a lot of handy and easily accessible libraries and we’re going to look at how we can deliver text-to-speech with Python in this article.

Different API’s are available in Python in order to convert text to speech. One of Such API’s is the Google Text to Speech commonly known as the gTTS API. It is very easy to use the library which converts the text entered, into an audio file which can be saved as a mp3 file. It supports several languages and the speech can be delivered in any one of the two available audio speeds, fast or slow. More details can be found here

Convert Text into Speech


Import gTTS library and “os” module in order to play the converted audio

from gtts import gTTS 
import os

Creating a text that we want to convert into audio

text = “Global warming is the long-term rise in the average temperature of the Earth’s climate system”

gTTS supports multiple languages. Please refer to the documentation here. Selected ‘en’ -> English and stored in the language variable

language = ‘en’

Creating an object called speech and passing the text and language to the engine. Marked slow = False which tells the module that the converted audio should have a high speed.

speech = gTTS(text = text, lang = language, slow = False)

Saving the converted audio in a mp3 file named called ‘text.mp3’“text.mp3”)

Playing the converted file, using Windows command ‘start’ followed by the name of the mp3 file.

os.system(“start text.mp3”)



text.mp3 file

The output of the above program saved as text.mp3 file. Mp3 file should be a voice saying, 'Global warming is the long-term rise in the average temperature of the Earth’s climate system'

Convert a Text File into Speech

Here, covert the text file into speech. Reading the text file and pass to gTTS module


Import gTTS and os library

from gtts import gTTS 
import os

Reading the text file and store into object called text. My file name is “draft.txt”

file = open("draft.txt", "r").read().replace("n", " ")

Choosing language English

language = ‘en’

Passing the text file into gTTS module and store into speech

speech = gTTS(text = str(file), lang = language, slow = False)

Saving the converted audio in a mp3 file named called ‘voice.mp3’"voice.mp3")

Playing the mp3 file

os.system("start voice.mp3")



Converted draft.txt file into voice.mp3

Draft.txt file saved as a voice.mp3 file.Play the Mp3 file to listen the text presented in the draft.txt file


GTTS is an easy tool to convert text to voice, but it requires an internet connection to operate because it depends entirely on Google to get the audio data.
Thanks for reading. Keep learning and stay tuned for more!

Bio: Dhilip Subramanian is a Mechanical Engineer and has completed his Master’s in Analytics. He has 9 years of experience with specialization in various domains related to data including IT, marketing, banking, power, and manufacturing. He is passionate about NLP and machine learning. He is a contributor to the SAS community and loves to write technical articles on various aspects of data science on the Medium platform.

Original. Reposted with permission.


Kdnuggets RSS Feed

Linear algebra and optimization and machine learning: A textbook


This book teaches linear algebra and optimization as the primary topics of interest, and solutions to machine learning problems as applications of these methods. Therefore, the book also provides significant exposure to machine learning.

Sponsored Post.

Linear Algebra and Optimization for Machine Learning: A Textbook (Springer), authored by Charu C. Aggarwal, May 2020.

Table of Contents


PDF Download Link (Free for computers connected to subscribing institutions only). The PDF version has links for e-readers, and is preferable in terms of equation formatting to the Kindle version.

Buy hardcover from Springer or Amazon (for general public)

Buy low-cost paperback edition (MyCopy link on right appears only for computers connected to subscribing institutions)

A frequent challenge faced by beginners in machine learning is the extensive background requirement in linear algebra and optimization. This makes the learning curve very steep. This book, therefore, reverses the focus by teaching linear algebra and optimization as the primary topics of interest, and solutions to machine learning problems as applications of these methods. Therefore, the book also provides significant exposure to machine learning. The chapters of this book belong to two categories:

Linear algebra and its applications: These chapters focus on the basics of linear algebra together with their common applications to singular value decomposition, similarity matrices (kernel methods), and graph analysis. Numerous machine learning applications have been used as examples, such as spectral clustering, kernel-based classification, and outlier detection.

Optimization and its applications: Basic methods in optimization such as gradient descent, Newton’s method, and coordinate descent are discussed. Constrained optimization methods are introduced as well. Machine learning applications such as linear regression, SVMs, logistic regression, matrix factorization, recommender systems, and K-means clustering are discussed in detail. A general view of optimization in computational graphs is discussed together with its applications to backpropagation in neural networks.

Exercises are included both within the text of the chapters and at the end of the chapters. The book is written for a diverse audience, including graduate students, researchers, and practitioners. The book is available in both hardcopy (hardcover) and electronic versions. In case an electronic version is desired, it is strongly recommended to buy the PDF version (as opposed to the Kindle version for which it is hard for Springer to control layout and formatting of equations) at the following Springerlink pointer. For subscribing institutions click from a computer directly connected to your institution network to download the book for free. Springer uses the domain name of your computer to regulate access. To be eligible, your institution must subscribe to “e-book package english (Computer Science)” or “e-book package english (full collection)”. If your institution is eligible, you will see a (free) `Download Book’ button.

Kdnuggets RSS Feed

Cartoon: The Worst Telemedicine?


New KDnuggets cartoon examines what may be the worst example of telemedicine …

This cartoon examines what may be a worst application of telemedicine during this coronavirus crisis.

Cartoon: The Worst and Coronavirus

Doctor (via a screen):

Now move the drill a little to the right and drill harder!

Here are other

KDnuggets AI, Big Data, Data Science, and Machine Learning Cartoons

and KDnuggets posts tagged

See also other recent KDnuggets Cartoons:

Kdnuggets RSS Feed

Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot


The new conversational agent exhibit human-like behavior in conversations about almost any topic.

Natural language understanding(NLU) has been one of the most active areas adopting state-pf-the-art deep learning technologies. Today, we have dozens of mainstream NLU stacks that enable the implementation of decently sophisticated conversational agents with minimum efforts. However, the vast majority of conversational models remain highly constrained to a single subject. The industry refers to these agents as closed-domain chatbots. The opposite of closed-domain chatbots would be conversational agents that can engage in conversations across a multitude of topics simulating a human conversational style. We called this type of agents, open-domain chatbots and they are incredibly difficult to implement. Recently, the Facebook artificial intelligence research(FAIR) team unveiled the research and open source code for Blender, the largest-ever open domain chatbot.

The quest for building open-domain conversational agents that can mimic human-style dialogs is a key focus on NLU research for several reasons. Language is a fundamental element in the development of human intelligence since our infant days. Throughout that process, we acquire a series of skills such as the ability to listen, empathy or aligning different responses with a consistent point of view or values which are essential pieces of human communications. While we still don’t understand the neuroscientific architecture of those capabilities, we can agree that its recreation in NLU agents is required to achieved human-level communications. Not surprisingly, many of the companies pursuing research in open-domain chatbots are technology giants heavily invested in conversational interfaces. A few months ago, Google unveiled the research behind Meena, a conversational agent that could engage in dialogs across different topics. Despite those efforts, the implementation of open-domain chatbots remains incredibly challenging. In particular, there are three key challenges that remain incredibly crucial for the implementation of open-domain chatbots with the current generation of NLU technologies.

  1. Large Scale Pre-Training: Building open-domain chatbots today requires of massively large pretrained models. This approach has been proven by recent language agents such as Google’s BERT or Microsoft’s Turing-NLG.
  2. Blending Skills: Abilities such as empathy, unique personality or contextualized knowledge are essential for good conversations.
  3. Human Subjectivity: There is no effective way to quantify a human-like conversation. For that, we still rely on human judgement. Research has shown that subjective aspects such as the length of an answer can result in impact in the human judgements of quality.


Blender is an open source open-domain chatbot released as part of the ParlAI project. Blender is able to engage in a large variety of conversations across nearly any topic while displaying human-like characteristics such as empathy and personable levels of engagement. In order to achieve that, the Facebook team had to directly address some of the challenges outlined in the previous section.

Pretraining Scale

Blender is based on a transformer architecture similar to projects like BERT or Turing-NLG. The current version of Blender uses a pretrained neural network of 9.4 billion parameters. That big of a neural network can run on a single device. As a result, Blender uses a column-wise parallelism technique to split Blender into smaller neural networks that can execute in parallel while maintaining high levels of efficiency.

Blending Skills

In order to evaluate Blender for different human-like conversational skills, the Facebook team relies on a parallel research effort known as the Blended Skill Talk (BST). BST is a new dataset and benchmark to evaluate abilities such as knowledge and empathy in conversational agents. Specifically, BST combines the following datasets to evaluate the different blending skills:

The use of BST allowed Blender to learn different behaviors such as changing tones to appear empathetic to the other party or properly reacting to jokes.

Generation Strategies

As previously explained, aspects such as the length of an answer can have a strong impact on the quality of the conversation. To control that, Blender relies on a fine-tuned model for hyperparameter search which helps balances the tradeoffs between knowledge display and length.

Blender Architecture

Blender is a combination of three Transformer architectures that optimize different aspects of an open domain chatbot.

  1. Retriever: The Retriever transformer receives a dialog history as input and selects the next utterance. This is typically done by selecting the highest score across all possible responses in the training set.
  2. Generator: The Generator Transformer is a Seq2Seq model that generates different responses instead of selecting them from the training dataset. Blender leverage Generator models included in the current version of ParlAI.
  3. Retrieve and Refine: This Transformer model attempts to refine the response produce by traditional generative models. It is common knowledge that generative models often hallucinate responses. The Retrieve and Refine architecture uses tries to address these problems by introducing the retrieval step before the generation step and try to refine it as much as possible. Blender uses two retrieval techniques known as dialogue retrieval and knowledge retrieval.

Blender in Action

The current version of Blender includes different architectures trained on 90M, 2.7B and 9.4B parameters respectively. Not surprisingly, the initial tests showed that the larger models can achieve higher performance in fewer steps.



Facebook evaluated Blender using different benchmarks. Most notably, Blender was compared against the Google Meena chatbot using pairwise human evaluations. Blender outperformed Meena in terms of engagement(a) and humanness(b) style of conversations.



Additionally, Blender was also evaluated against human responses and the results were comparable. In fact, up to 49% of evaluators preferred Blender’s responses to humans.



The conversations produced by Blender are incredibly impressive. The examples below that give us the glimpse of the level of engagement, vast knowledge and vocabulary used by the conversational agent.



Blender represents an important milestone in the implementation of open-domain conversational agents. Even though Blender can still be repetitive and make mistakes, its performance showed that we might be just one or two breakthroughs away from achieving human-like conversational capabilities in AI agents.

Original. Reposted with permission.


Kdnuggets RSS Feed

5 Great New Features in Scikit-learn 0.23


Check out 5 new features of the latest Scikit-learn release, including the ability to visualize estimators in notebooks, improvements to both k-means and gradient boosting, some new linear model implementations, and sample weight support for a pair of existing regressors.


The latest release of Python’s workhorse machine learning library has been released, and version 0.23 includes a number of new features and bug fixes. You can find the release highlights on the official Scikit-learn website, and can find the exhaustive release notes here.

Updating your installation is done via pip:

   pip install --upgrade scikit-learn

or conda:

   conda install scikit-learn

With that out of the way, here are 5 features in Scikit-learn’s latest release you should know about.

1. Visual Representation of Estimators in Notebooks

By using Scikit-learn’s set_config() module, one can enable the global display='diagram' option in your Jupyter notebooks. Once set, it can be used to provide visual summarization of the structures of both pipelines and composite estimators you have employed in your notebooks. The resultant diagrams as interactive, allowing sections such as pipelines, transformers and more to be expanded. See an example of an expanded diagram below (from Scikit-learn’s website).


2. Improvements to K-Means

The Scikit-learn implementation of k-means has been revamped. It now purports to be faster and more stable. OpenMP parallelism has also now been adopted, and so the joblib-reliant n_jobs training parameter has gone the way of the dodo. Check out the Scikit-learn parallelism notes for more info on thread control.

Also, the Elkan algorithm now supports sparse matrices.

3. Improvements to Gradient Boosting

Both the HistGradientBoostingClassifier and HistGradientBoostingRegressor have received numerous improvements. Support for early stopping has been introduced, and is enabled by default for datasets with a number of samples greater than 10K. Monotonic constraints are also now supported, allowing predictions to be constrained based on specific features, which you can read more about here. The addition of a simple monotonic constraint is shown below.

gbdt_cst = HistGradientBoostingRegressor(monotonic_cst=[1, 0]).fit(X, y)

HistGradientBoostingRegressor has added support for a new poisson loss as well.

gbdt = HistGradientBoostingRegressor(loss='poisson', learning_rate=.01)

4. New Generalized Linear Models

Three new regressors with non-normal loss functions have been added this time around:


There isn’t a whole lot more to say superficially about these, other than they are implementations of a generalized linear regressor with these three different distributions. You can read more details in the generalized linear regression documentation.

5. Sample Weight Support for Existing Regressors

We get some new regressors this time around, as outlined above, but we also get support for sample weighting in a pair of existing reggressors, namely Lasso and ElasticNet. It is easily implemented; as a parameter to the regressor instantiation, it requires as input an array of the size of the number of samples in the dataset (shown as randomly generated in the snippet below, from Scikit-learn’s website; highlights added).

n_samples, n_features = 1000, 20
rng = np.random.RandomState(0)
X, y = make_regression(n_samples, n_features, random_state=rng)
sample_weight = rng.rand(n_samples)
X_train, X_test, y_train, y_test, sw_train, sw_test = train_test_split(
    X, y, sample_weight, random_state=rng)
reg = Lasso(), y_train, sample_weight=sw_train)
print(reg.score(X_test, y_test, sw_test))

For more information on all of the changes and updates in Scikit-learn 0.23, have a look at the full release notes.

Happy machine learning!


Kdnuggets RSS Feed

AI Channels to Follow


AI is certainly playing an important role in our global fight against the novel coronavirus. These YouTube channels are recommended to keep you covered with the latest advancements in the field and how it is impacting our world.

By Evelyn Johnson, blogger about technology.

As the world continues to fight the novel coronavirus, every section of the society is playing its role. Medical professionals, law enforcement officers, and essential workers are on the frontline of this battle, while researchers, data scientists, and community leaders are also doing their part.

There has been a recent increased emphasis on how AI can boost humanity’s efforts in this on-going crisis. Experts believe Artificial Intelligence can analyze published literature on the disease, study structure and DNA of the virus, and recommend existing drugs or find a new one (in the latter case, it would take almost two years for the drug to be approved by the FDA).

There has also been a discussion on how AI can help governments be smart about social distancing policies. While lockdowns can prevent the spread of COVID-19, they can devastate the economy as this research by ClothingRIC demonstrates. An AI-based approach would allow the government to monitor the population movement, identify vulnerable communities, and implement social distancing without closing down the economy.

Follow these AI Channels on YouTube to Stay Ahead

The entire debate surrounding AI’s role in combating the virus is constantly evolving. Some YouTubers are providing us a glimpse into the world where AI can be humanity’s best weapon against COVID-19.

Here’s are some AI channels that are a necessary viewing for anyone pinning their hopes on AI in these trying times.

1.    LexFridman

Lex Fridman is a known voice for human-centered AI. The MIT professor has been on Joe Rogan’s podcast multiple times, and his channel on YouTube has garnered more than 19 million views. He hosts his own aptly named Artificial Intelligence podcast, which has guests such as Twitter CEO Jack Dorsey and evolutionary biologist Richard Dawkins.

As someone who has advocated for human-robot collaboration, Lex Fridman’s voice carries great weight in the current scenario. His podcast with Dmitry Korkin, a researcher in Disease Informatics, is a must-listen for anyone that desires insight on the coronavirus pandemic.

2.    Welcome.AI

Countries around the world are trying to curtail coronavirus by reducing human-to-human contact. In this environment, robotics has once again taken center stage.  Futurists believe robots doing essential work during the pandemic can save a great number of lives.

[embedded content]

Welcome.AI brings the latest videos of AI-powered machinery from all over the globe. The demonstrations of self-driving cars and caregiver robots demonstrate a potential solution to the current crisis.

3.    Raj Ramesh

Raj Ramesh has been explaining AI-specific issues since 2011. With crafty doodle videos, this channel explores subjects such as deep learning, computer vision, architecture, and business.

[embedded content]

One of Raj’s recent videos lays out a map for effective decision-making in the context of the prevailing crisis. It’s an insightful video that creates a framework powered by AI, which could prove to be a viable strategy in containing the coronavirus.

4.    The Medical Futurist

Dr. Bertalan Mesko is a Ph.D. in genomics. His channel, The Medical Futurist, highlights the future of healthcare, which continues to be reshaped by disruptive technologies. The collaboration of AI and medicine is one of the core subjects covered on his channel.

[embedded content]

The Medical Futurist has a unique set of expertise that makes him an important YouTuber in a coronavirus-effected world.


With the fight against coronavirus entering into a critical stage, AI can be a great weapon of choice for humanity. Algorithms can help in devising the best treatments for the sick people and stop further spread by helping governments apply social distancing more strategically.

Platforms such as YouTube amplify the debate on AI’s effectiveness against coronavirus. By following AI content creators on this site, netizens can not only gain a fresh perspective on Artificial Intelligence but also learn of its efficacy in the face of a global pandemic.

Bio: Evelyn Johnson is a writer by day and a reader by night, I’m a blogger and content marketer. I started my career as a junior writer in an Ad agency but over time my passion for food and travel grew so much that I started writing my own articles and blogs and get them posted on major websites.