Kdnuggets RSS Feed

Data Extraction Software Octoparse 8 vs Octoparse 7: What’s New


Octoparse 8 was recently released. Get a better understanding of what the differences between OP 8 and 7 are by reading this overview.

Sponsored Post.

Our brand new version OP 8 just came out a few weeks ago. To help you get a better understanding of what the differences between OP 8 and 7 are, I have concluded all the updates in this article.


1. Faster scraping Speed

We all know how valuable time is, and scraping speed is the core feature to deliver high-quality extraction experience. In version 8.1, local run speed has improved significantly, which is 10 times faster than in the previous version. It took 21mins 41 seconds to extract 100 records of data in OP 7. But now in OP 8, it only takes 2 mins to get the same amount of data from the same web pages!


Also, we strive hard to provide a stable and smooth scraping experience. OP 8.1 solved tech issues in OP 7 like stuttering, flashback, and software crash. All the common UI interactions with OP 8.1 such as app launching, task configuration and data export can be completed within 0.5s on average.

2. Robust Compatibility

1. Mac OS Compatible

The great news is that besides the Windows systems, OP 8.1 is available for Mac OS as well! Mac users have developed a strong voice in demand on the Mac version. You spoke, we listened! So here it is!

[embedded content]

2. Advanced Browser Compatibility

We switched the browser engine from Firefox 7.0 to Google Chrome 8.0. You’ll notice the websites that failed to load with OP 7 can now be loaded with OP 8.1 right away.

3. Hands-Free Workflow is now available

1. Auto-detect data fields

Another essential feature that we upgraded is automation. Once you enter a webpage URL in OP 8, you will notice Octoparse detects the website and guesses data fields automatically. For users who are exhausted with writing XPath in OP 7, this brings huge convenience as there’s no need to build the crawlers from scratch.

2. Switch detection results to find the best fit

If the default detection results don’t suit the users’ needs, they can switch the results to capture other layers of data. Octoparse 8 auto-detects multiple layers of web data, which takes hands-free data extraction to another level.

[embedded content]

3. Trigger Nested Web Pages Extraction in one click

“Nested web page” means the detail page that follows certain links on the list page. In OP 7, if you need to capture the nested web page data, you need to create pagination to click through each listing and then scrape detail page content. With OP 8, this process is greatly simplified. Users can trigger the pagination simply by checking one button on the Tips Panel, and it will work on its own getting the detail page for you.


(Nested Web Page Extraction)

4. A broader and clearer view of the sample data

With the OP 7 version, you have to go back and forth between the browser and control panel to view the data, whereas, with OP 8, there is a data preview section that gives you a broader view of all the data columns before executing the task. What’s more, by pointing and clicking on the specific data, Octoparse 8 will highlight the data in the built-in browser, which helps you easily check its location.


(Data highlighted automatically)

4. Refreshing Design to have more control

The OP 8 whole design is very different from OP 7. In OP 7, the built-in browser sits below the workflow and the customized area. OP 8 interface is super clean with an upgraded menu bar and workflow on the left-hand side of the browser.


1. Sidebar is smarter

The “Quick Filter” and “Recent Tasks” filters on the sidebar are ready to access your recent project with a simple click. Compared to OP 7, the new release now allows you to access your project quickly without having to go back to the dashboard and search through the entire list of scraping tasks. You can cut in quickly even if the sidebar is collapsed.


2. Manage silos of tasks at your fingertips

As your project involves multiple scrapers in Octoparse, you are likely to feel encompassed by a number of tasks – some need more attention to prioritize, while others are buried away for later access. In OP 7, we provided multiple filter options for you to sort them, such as cloud status, local status, task type, schedule status, task group, and so on.

In OP 8, it takes the filter to a more fine-grained level. To view all tasks that have been completed during a specific period of time, or tasks with certain records of data extracted, just simply define your filters on the top. You can also name and save your filters to access your desired tasks immediately next time.


(Octoparse 8 filters)

5. Upcoming Features

We expect to have a few more features added in the next release:

  • XPath Tool
  • Data export to JSON format
  • Workflow auto-debug

6. Final Notes

You can install both versions on your device. OP 7 supports Windows XP, 7, 8, 10. OP 8.1 supports all the above systems and Mac OS except Windows XP (only supports x64 bit systems).

Before its Full Official Release, we need to make sure the new version can hit the quality bar in the following two weeks. And we can’t make it without your feedback and suggestions. If you haven’t got hands on the 8.1 Beta, take the latest release for a spin. If you encounter any unknown issues please reach out via bug reports or email us directly at

Kdnuggets RSS Feed

Top Stories, May 18-24: The Best NLP with Deep Learning Course is Free


Also: Automated Machine Learning: The Free eBook; Sparse Matrix Representation in Python; Build and deploy your first machine learning web app; Complex logic at breakneck speed: Try Julia for data science

Most Popular Last Week

  1. newStanfordThe Best NLP with Deep Learning Course is Free, by Matthew Mayo
  2. newBuild and deploy your first machine learning web app, by Tirthajyoti Sarkar
  3. newAutomated Machine Learning: The Free eBook
  4. newComplex logic at breakneck speed: Try Julia for data science
  5. newEasy Text-to-Speech with Python
  6. newAn easy guide to choose the right Machine Learning algorithm
  7. newWhat they do not tell you about machine learning

Most Shared Last Week

  1. The Best NLP with Deep Learning Course is Free, by Matthew Mayo – May 22, 2020.
  2. Automated Machine Learning: The Free eBook, by Matthew Mayo – May 18, 2020.
  3. Sparse Matrix Representation in Python – May 19, 2020.
  4. Build and deploy your first machine learning web app – May 22, 2020.
  5. 13 must-read papers from AI experts – May 20, 2020.
  6. An easy guide to choose the right Machine Learning algorithm – May 21, 2020.
  7. Appropriately Handling Missing Values for Statistical Modelling and Prediction – May 22, 2020.

Most Popular Past 30 Days

  1. newFive Cool Python Libraries for Data Science
  2. increaseThe Super Duper NLP Repo: 100 Ready-to-Run Colab Notebooks
  3. newNatural Language Processing Recipes: Best Practices and Examples
  4. newThe Best NLP with Deep Learning Course is Free
  5. decrease24 Best (and Free) Books To Understand Machine Learning
  6. decreaseFree High-Quality Machine Learning & Data Science Books & Courses: Quarantine Edition
  7. newDeep Learning: The Free eBook

Most Shared Past 30 Days

  1. The Best NLP with Deep Learning Course is Free – May 22, 2020.
  2. Automated Machine Learning: The Free eBook – May 18, 2020.
  3. Beginners Learning Path for Machine Learning – May 05, 2020.
  4. AI and Machine Learning for Healthcare – May 14, 2020.
  5. Natural Language Processing Recipes: Best Practices and Examples – May 01, 2020.
  6. Deep Learning: The Free eBook – May 04, 2020.
  7. Start Your Machine Learning Career in Quarantine – May 11, 2020.
Kdnuggets RSS Feed

Python For Everybody: The Free eBook


Get back to fundamentals with this free eBook, Python For Everybody, approaching the learning of programming from a data analysis perspective.

It’s a new week, which means it’s also time to profile and share a new free eBook. This week we get back to basics with Python For Everybody, written by Charles R. Severance, a book intended to develop or strengthen your foundational Python programming skills.


Python For Everybody was written as an accompanying text for Python for Everybody Specialization on Coursera, Python for Everybody (2 courses) on edX, and Python for Everybody (2 courses) on FutureLearn, all of which were also created by the book’s author.

This book is particularly suited to individuals looking to learn Python in the context of data science and data analytics, according to the author:

The goal of this book is to provide an Informatics-oriented introduction to programming. The primary difference between a computer science approach and the Informatics approach taken in this book is a greater focus on using Python to solve data analysis problems common in the world of Informatics.

First off, you should know that Python for Everybody uses Python 3, though an older version of the book using Python 2 is still available should you, for some reason, want to learn Python 2 (you should definitely not want to do so, however). It is also code-centric, not spending much time on programming theory but rather jumping right to implementation.

Simply put, Python For Everybody teaches you what you need to know about Python to get writing practically useful code right now, particularly from a data analysis perspective.

The book’s table of contents are as follows:

  1. Why should you learn to write programs?
  2. Variables, expressions, and statements
  3. Conditional execution
  4. Functions
  5. Iteration
  6. Strings
  7. Files
  8. Lists
  9. Dictionaries
  10. Tuples
  11. Regular expressions
  12. Networked programs
  13. Using Web Services
  14. Object-oriented programming
  15. Using Databases and SQL
  16. Visualizing data

A review of the Kindle version of this book on Amazon states the following:

I have not found a better beginner Python book out there. Plus, now that I am a professional Python programmer, I find myself constantly referring to this book to clarify certain points and reinforce understanding of basic principles. (I may be a professional, but I’m still a rookie.) I’ve purchased a few fat, expensive Python programming books with animals on the cover, but these tend to collect dust. In short, I have a new-found appreciation for this book and how much work went in to writing it. Thanks, Professor Severance!

And it’s not the only positive review; 448 ratings of the book with an average of 4.6 out of 5 should tell you that many others have also found Python for Everybody useful. The consensus seems to be that the book quickly covers concepts, does so in an easily understandable manner, and jumps right into the corresponding code.

Aside from English, the book is also available in Spanish, Italian, Portuguese, and Chinese. You can find further information and links to these editions on the book’s website.

Download the PDF here. You can optionally read the book as a series of interactive Jupyter notebooks here. If you like the book and want to support the author, paperback and electronic (Kindle) copies can be purchased on Amazon.

If you are new to data science and are looking to get a grip on one of the field’s most dominant programming languages, freely-available Python for Everybody is a book that should be at the top of your list.


Kdnuggets RSS Feed

10 Useful Machine Learning Practices For Python Developers


While you may be a data scientist, you are still a developer at the core. This means your code should be skillful. Follow these 10 tips to make sure you quickly deliver bug-free machine learning solutions.

By Pratik Bhavsar, Remote NLP engineer.

Sometimes as a data scientist, we forget what we are paid for. We are primarily developers, then researchers, and then maybe mathematicians. Our first responsibility is to quickly develop solutions that are bug-free.

Just because we can make models doesn’t mean we are gods. It doesn’t give us the freedom to write crap code.

Since my start, I have made tremendous mistakes and thought of sharing what I see to be the most common skills for ML engineering. In my opinion, it’s also the most lacking skill in the industry right now.

I call them software-illiterate data scientists because a lot of them are non-CS Coursera baptized engineers. And, I myself have been that. 😅

If it came to hiring between a great data scientist and a great ML engineer, I would hire the latter.

Let’s get started.

1. Learn to write abstract classes

Once you start writing abstract classes, you will know how much clarity it can bring to your codebase. They enforce the same methods and method names. If many people are working on the same project, everyone starts making different methods. This can create unproductive chaos.

2. Fix your seed at the top

Reproducibility of experiments is a very important thing, and seed is our enemy. Catch hold of it. Otherwise, it leads to different splitting of train/test data and different initialisation of weights in the neural network. This leads to inconsistent results.

3. Get started with a few rows

If your data is too big and you are working in the later part of the code, like cleaning data or modeling, use nrows to avoid loading the huge data every time. Use this when you want to only test code and not actually run the whole thing.

This is very applicable when your local PC config is not enough to work with the datasize, but you like doing development on local on Jupyter/VS code/Atom.

df_train = pd.read_csv(‘train.csv’, nrows=1000)

4. Anticipate failures (the sign of a mature developer)

Always check for NA in the data because these will cause you problems later. Even if your current data doesn’t have any, it doesn’t mean it will not happen in the future retraining loops. So keep checks anyway. 😆





5. Show the progress of processing

When you are working with big data, it definitely feels good to know how much time is it going to take and where we are in the whole processing.

Option 1 — tqdm

Option 2 — fastprogress


6. Pandas can be slow

If you have worked with pandas, you know how slow it can get some times — especially groupby. Rather than breaking our heads to find ‘great’ solutions for speedup, just use modin by changing one line of code.

import modin.pandas as pd

7. Time the functions

Not all functions are created equal.

Even if the whole code works, it doesn’t mean you wrote great code. Some soft-bugs can actually make your code slower, and it’s necessary to find them. Use this decorator to log the time of functions.

8. Don’t burn money on cloud

Nobody likes an engineer who wastes cloud resources.

Some of our experiments can run for hours. It’s difficult to keep track of it and shut down the cloud instance when it’s done. I have made mistakes myself and have also seen people leaving instances on for days.

This happens when we work on Fridays and leave something running and realise it on Monday. 😆

Just call this function at the end of execution, and your ass will never be on fire again!

But wrap the main code in try and this method again in except as well — so that if an error happens, the server is not left running. Yes, I have dealt with these cases too. 😅

Let’s be a bit responsible and not generate CO2. 😅

9. Create and save reports

After a particular point in modeling, all great insights come only from error and metric analysis. Make sure to create and save well-formatted reports for yourself and your manager.

Anyway, management loves reports, right? 😆

10. Write great APIs

All that ends bad is bad.

You can do great data cleaning and modeling, but still, you can create huge chaos at the end. My experience with people tells me many are not clear about how to write good APIs, documentation, and server setup.

Below is a good methodology for a classical ML and DL deployment under not too high load — like 1000/min.

Meet the combo — Fastapi + uvicorn

  • Fastest— Write the API in fastapi because its the fastest for I/O bound as per this, and the reason is explained here.
  • Documentation— Writing API in fastapi gives us free documentation and test endpoints at http:url/docs → autogenerated and updated by fastapi as we change the code
  • Workers— Deploy the API using uvicorn

Run these commands to deploy using 4 workers. Optimise the number of workers by load testing.

pip install fastapi uvicorn
uvicorn main:app –workers 4 –host –port 8000


Original. Reposted with permission.


Kdnuggets RSS Feed

LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability


Spark-TFRecord enables the processing of TensorFlow’s TFRecord structures in Apache Spark.

Interoperating TensorFlow and Apache Spark is a common challenge in real world machine learning scenarios. TensorFlow is, arguably, the most popular deep learning framework in the market while Apache Spark remains one of the most widely adopted data computations platforms with a large install based across large enterprises and startups. It is only natural that companies will try to combine the two. While there are frameworks that adapt TensorFlow to Spark, the root of the interoperability challenge is often rooted at the data level. TFRecord, is the native data structure in TensorFlow, is not fully supported in Apache Spark. Recently, engineers from LinkedIn open sourced Spark-TFRecord, a new native data source from Spark based on the TensorFlow TFRecord.

The fact that LinkedIn decided to address this problem is not surprising. The internet giant has long been a wide adopter of Spark technologies and has been an active contributor to the TensorFlow and machine learning open source communities. Internally, LinkedIn’s engineering teams were regularly trying to implement transformation between TensorFlow’s native TFRecord format and Spark’s internal formats such as Avro or Parquet. The goal of the Spark-TFRecord project was to provide the native functionalities of the TFRecord structure in Spark pipelines.

Prior Attempts

Spark-TFRecord is not the first project that attempts to solve the data interoperability challenges between Spark and TensorFlow. The most popular project in that reel is the Spark-Tensorflow-Connector promoted by Spark’s creator Databricks. We have used the Spark-TensorFlow-Connector plenty of times with various degrees of success. Architecturally, the connector is an adaptation of the TFRecord format into Spark SQL DataFrames. Knowing that, it shouldn’t be surprising that the Spark-TensorFlow-Connector works very effectively in relational data access scenarios but remains very limited in other use cases.

If you think about it, an important part of a TensorFlow workflow is related to disk I/O operations rather than database access. In those scenarios, developers end up writing considerable amounts of code when using the Spark-TensorFlow-Connector. Additionally, the current version of the Spark-TensorFlow-Connector still lacks important functions such as the PartitionBy which are regularly used in TensorFlow computations. Finally, the connector is more like a bridge to process TensorFlow records in Spark SQL Data Frames rather than a native file format.

Factoring in those limitations, the LinkedIn engineering team decided to address the Spark-TensorFlow interoperability challenge from a slightly different perspective.


Spark-TFRecord is a native TensorFlow TFRecord for Apache Spark. Specifically, Spark-TFRecord provides the routines for reading and writing TFREcord data from/to Apache Spark. Instead of building a connector to process TFRecord structures, Spark-TFRecord is built as a native Spark dataset just like Avro, JSON or Parquet. That means that all Spark’s DataSet and DataFrame I/O routines are automatically available within a Spark-TFRecord.

An obvious question worth exploring is why build a new data structure instead of simply versioning the open source Spark-TensorFlow connector? Well, it seems that adapting the connector to disk I/O operations require a fundamental redesign.

Instead of following that route, the LinkedIn engineering team decided to implement a new Spark FileFormat interface which is fundamentally designed to support disk I/O operations. The new interface would adapt the TFRecord native operations to any Spark DataFrames. Architecturally, Spark-TFRecord is composed of a series of basic building blocks that abstract reading/writing and serialization/deserialization routines:

  • Schema Inferencer: This is the component that is closest to the Spark-TensorFlow-Connector. This interface maps TFRecords representations into native Spark data types.

    TFRecord Reader: This component reads TFRecord structures and passes them to the deserializer.

    TFRecord Writer: This component receives a TFRecord structure from the serializer and writes it to disk.

    TFRecord Serializer: This component converts Spark InternalRow to TFRecord structures.

    TFRecord Deserializer: This component converts TFRecords to Spark InternalRow structures.

Using LinkedIn’s Spark-TFRecord is not different from other Spark native datasets. A developer simply needs to include the spark-tfrecord jar library and use the traditional DataFrame API to read and write TFRecords as illustrated in the following code:

import org.apache.spark.sql.{ DataFrame, Row }
import org.apache.spark.sql.catalyst.expressions.GenericRow
import org.apache.spark.sql.types._

val path = "test-output.tfrecord"
val testRows: Array[Row] = Array(
new GenericRow(Array[Any](11, 1, 23L, 10.0F, 14.0, List(1.0, 2.0), "r1")),
new GenericRow(Array[Any](21, 2, 24L, 12.0F, 15.0, List(2.0, 2.0), "r2")))
val schema = StructType(List(StructField("id", IntegerType),
                             StructField("IntegerCol", IntegerType),
                             StructField("LongCol", LongType),
                             StructField("FloatCol", FloatType),
                             StructField("DoubleCol", DoubleType),
                             StructField("VectorCol", ArrayType(DoubleType, true)),
                             StructField("StringCol", StringType)))

val rdd = spark.sparkContext.parallelize(testRows)

//Save DataFrame as TFRecords
val df: DataFrame = spark.createDataFrame(rdd, schema)
df.write.format("tfrecord").option("recordType", "Example").save(path)

//Read TFRecords into DataFrame.
//The DataFrame schema is inferred from the TFRecords if no custom schema is provided.
val importedDf1: DataFrame ="tfrecord").option("recordType", "Example").load(path)

//Read TFRecords into DataFrame using custom schema
val importedDf2: DataFrame ="tfrecord").schema(schema).load(path)

The interoperability between Spark and deep learning frameworks like TensorFlow is likely to continue being a challenging area for most organizations. However, projects like LinkedIn’s Spark-TFRecord that have been tested at a massive scale definitely help to simplify the bridge between these two technologies that are essential to so many modern machine learning architectures.

Original. Reposted with permission.


Kdnuggets RSS Feed

5 Machine Learning Papers on Face Recognition


This article will highlight some of that research and introduce five machine learning papers on face recognition.

By Limarc Ambalina, Gengo

5 Machine Learning Papers on Face Recognition

Face recognition, or facial recognition, is one of the largest areas of research within computer vision. We can now use face recognition to unlock our mobile phones, verify identification at security gates, and in some countries, make purchases. With the ability to make numerous processes more efficient, many companies invest into the research and development of facial recognition technology. This article will highlight some of that research and introduce five machine learning papers on face recognition.

Essential Machine Learning Papers on Face Recognition

1. A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing

With a multitude of real-world applications, face recognition technology is becoming more and more prominent. From smartphone unlocking to face verification payment methods, facial recognition could improve security and surveillance in many ways. However, the technology also poses several risks. Numerous face spoofing methods could be used to fool these systems. Therefore, face anti-spoofing is essential to prevent security breaches.

In order to support face anti-spoofing research, the authors of this paper introduce a multi-modal face anti-spoofing dataset named CASIASURF. As of the writing of this paper, it is the largest open dataset for face anti-spoofing. Specifically, the dataset includes 21,000 videos taken of 1,000 subjects in RGB, Depth, and IR modalities. In addition to the dataset, the authors present a novel multi-modal fusion model as a baseline for face anti-spoofing.

Published / Last Updated – April 1st, 2019

Authors and Contributors – Shifeng Zhang (NLPR, CASIA, UCAS, China) , Xiaobo Wang (JD AI Research), Ajian Liu (MUST, Macau, China), Chenxu Zhao (JD AI Research), Jun Wan (NLPR, CASIA, UCAS, China), Sergio Escalera (University of Barcelona), Hailin Shi (JD AI Research), Zezheng Wang (JD Finance), Stan Z. Li (NLPR, CASIA, UCAS, China).

Read Now

2. FaceNet: A Unified Embedding for Face Recognition and Clustering

In this paper, the authors present a face recognition system called FaceNet. This system uses a deep convolutional neural network which optimizes the embedding, rather than using an intermediate bottleneck layer. The authors state that the most important aspect of this method is the end-to-end learning of the system.

The team trained the convolutional neural network on a CPU cluster for 1,000 to 2,000 hours. They then evaluated their method on four datasets. Notably, FaceNet attained an accuracy of 99.63% on the famous Labeled Faces in the Wild (LFW) dataset, and 95.12% on the Youtube Faces Database.

Published / Last Updated – June 17th, 2015

Authors and Contributors – Florian Schroff, Dmitry Kalenichenko, and James Philbin, from Google Inc.

Read Now

3. Probabilistic Face Embeddings

As of the writing of this article, current embedding methods used for face recognition are able to achieve high performance in controlled settings. These methods work by taking an image of a face and storing data about that face in a latent semantic space. However, when tested in fully uncontrolled settings, the current methods cannot perform as well. This is due to instances where facial features are absent from or ambiguous in the image. An example of such a case would be face recognition in surveillance videos, where the quality of the video may be low.

To help address this issue, the authors of this paper propose Probabilistic Face Embeddings (PFEs). The authors propose a method for converting existing deterministic embeddings into PFEs. Most importantly, the authors state that this method effectively improves performance in face recognition models.

Published / Last Updated – August 7th, 2019

Authors and Contributors – Yichun Shi and Anil K. Jain, from Michigan State University.

Read Now

4. The Devil of Face Recognition is in the Noise

In this study, researchers from SenseTime Research, the University of California San Diego, and Nanyang Technological University studied the effects of noise in large-scale face image datasets. Many large datasets are prone to label noise, due to their scale and cost-effective nature. With this paper, the authors aim to provide knowledge around the source of label noise and the consequences it has in face recognition models. Additionally, they aim to build and release a clean face recognition dataset titled IMDb-Face.

Two of the main goals of the study were to discover the effects of noise on final performance, and determine the best strategy to annotate face identities. To do so, the team manually cleaned two popular open face image datasets, MegaFace and MS-Celeb-1M. Their experiments showed that a model trained on just 32% of their cleaned MegaFace dataset and 20% of the cleaned MS-Celeb-1M dataset achieved similar performance to models trained on the entirety of the original uncleaned datasets.

Published / Last Updated – July 31st, 2018

Authors and Contributors – Fei Wang (SenseTime), Liren Chen (University of California San Diego), Cheng Li (SenseTime), Shiyao Huang (SenseTime), Yanjie Chen (SenseTime), Chen Qian (SenseTime), and Chen Change Loy (Nanyang Technological University).

Read Now

5. VGGFace2: A dataset for recognising faces across pose and age

Numerous studies have been done on deep convolutional neural networks for facial recognition. In turn, numerous large-scale face image datasets have been created to train those models. However, the authors of this paper state that the previously released datasets do not contain much data on pose and age variation in faces.

In this paper, researchers from the University of Oxford introduce the VGGFace2 dataset. This dataset includes images which have a wide range of age, ethnicity, illumination, and pose variations. In total, the dataset contains 3.31 million images and 9,131 subjects.

Published / Last Updated – May 13th, 2018

Authors and Contributors – Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman, from the Visual Geometry Group at the University of Oxford.

Read Now

Hopefully, the machine learning papers on face recognition above helped strengthen your understanding of the work being done in the field.

New studies in face recognition are done every year. To keep up with the latest machine learning papers and other AI news, please subscribe to our newsletter. For more reading on facial recognition, please see the related resources below.

Bio: Limarc Ambalina is a Tokyo-based writer specializing in AI, tech, and pop culture. He has written for numerous publications including Hacker Noon, Japan Today, and Towards Data Science.
Original. Reposted with permission.