A practical guide for aspiring Data Scientists

I happened to receive many calls from friends, and friends of friends on how to break into a Data Scientist job. So I thought, for the benefit of all those who have similar questions, this would be a definitive place to find all the answers.

Data Scientist: Roles and Responsibilities

Though the exact scope of roles and responsibilities of a Data Scientist may vary a little depending on what type of company you join, the core responsibilities are pretty standard:

  1. Data Analysis and Visualisation
  2. Building AI/ML Models
  3. Cross-functional Communication Skills
  4. Basic to Intermediate Software Development Skills

Apart from these you may also be expected to…

With the Roberta Sentence Tokenizer

View in a valley
The Requirement

I was recently exploring multiple models for building a cognitive search bot for open-domain question answering. The bot should be capable of returning the appropriate answer to the question posed by the user. Of course, the efficacy of the bot is limited by the content of the training dataset. However, this bot should be robust enough to handle misspelled words — especially named entities (proper nouns such as names, places, animals, things).

For my prototype, I chose the WikiQA dataset from Microsoft Research. …

Learning from a practical NLP project

The problem of incident ticket classification is one of huge impact to IT companies. When users raise a ticket, the ticket needs to be directed to the right team as quickly as possible, to ensure speedy resolution. Sending it to the wrong department leads to longer resolution times since it takes time for the ticket to be redirected to the right team. Currently, in most of the companies, ticket classification is done manually, which is prone to errors, and is tedious as the volume of tickets increases. Hence, we need a better solution to handle this problem.

Auto-ticket assignment: Problem

In order to…

An intuitive guide to calculating input shape and complexity of neural networks

While building neural networks, a lot of beginners and non-beginners alike, seem to get caught up in figuring out the input shape that needs to be fed into the neural network.

But why should we know the input shape, and why should we feed it? Can’t the Neural Network figure it out on its own?
The answer to this question lies in the basics of matrix multiplication.

Suppose we have two matrices A and B. Let the dimensions of B be m rows x n columns. Now, for the two matrices to be compatible for multiplication, the column dimension…

An introduction to Pattern Exploiting Training

The overwhelming world of mammoth language models

Ever since the advent of transfer learning in Natural Language Processing, larger and larger models have been presented, in order to make more and more complex language tasks possible.

But the more complex the models, the more time and the more amount of data it needs to train. The latest GPT-3 model achieves state of the art results in most natural language tasks, but it has close to 175 billion parameters to train, and takes years to train!

So is there a way around it?

Timo Schick and Hinrich Schutze came up with an ensemble masked language model training method which has proven to be as potent…

How to prepare for each round

Earlier this year, I received an interview invitation from Amazon for the role of Operations Research Scientist at their Luxembourg office. The invite came with a link to suggest possible time slots for the first round of interview.

I was super excited when I saw the mail! I was looking to move to the EU, and had Amazon LUX on my radar! So I went ahead and gave a few options for scheduling the first round of interviews.

The First Round — HR Round

I also reached out to a few friends at Amazon for a heads-up on what I’d be facing in the first round.

Strategise your digital marketing efforts using insights from Social Network Analysis

An important part of digital marketing strategy is Social Media Marketing. It is the latest addition to a plethora of digital marketing tools, and is ever-evolving, with new entrants in the social network space.

Or any Tech job for that matter

It was January 2020. I was elated that I had cracked a Data Science role at a large Dutch International Airlines. I was asked to move to the Netherlands by April 2020. I currently worked as a Data Scientist and a Product Manager at my company. So transitioning to a Data Science role abroad was not difficult.

I now had three months to get my visa, resign from my current job (and serve the notice period), book my tickets, find my temporary accommodation and get a bank account, a tax ID and what not! Panic gripped me at the sheer…

And I was only using a simple Logistic Regression Classifier!

I had taken an online Masters’ course in AI and ML from UT Austin. For their Capstone project they offered a choice between a Computer Vision based problem statement and an Natural Language Processing based problem statement.

Given my interest in NLP, I naturally chose the NLP project. The problem statement of the project was to build an IT Service Ticket Classification Model form a dataset of pre-labelled tickets. You can read about the project in this article.


I was given a set of It tickets along with the category they belonged to. There were a total of 73 ticket…

Revenue Management 101 — Part 1

Friday evenings are the worst of days to book a flight to literally anywhere! And Monday mornings, just as worse! You know this by now. And if you are like me and many people I know, then you keep an eye on your air-fare, waiting for it to go down. But can we do better at guessing price fluctuations?

You might think there is no rhyme or rhythm to how the ‘airline guys’ jack up the price of your ticket, and why sometimes it is ridiculously low. …

