What Do You Actually Do!? Episode 27: Harpal Sahota, Data Scientist

Today’s episode of What Do You Actually Do!? will look at the world of data science and what skills are required to work in the industry. We interviewed Harpal Sahota, who works as a Data Scientist at MagicLab, a company which runs dating apps such as Bumble & Lumen.

(Please note: this is a re-released episode which was recorded in 2020)

Subscribe on your favourite platform now.

About Harpal

Harpal currently works as a Data Scientist at MagicLab, who built and own a number of popular dating applications. Previously, Harpal studied a Masters in Computational Biology at the University of York, before completing a PhD in the same subject at the University of London. He then went on to work for companies such as YouGov and Zoopla, before settling down at MagicLab, where he has worked since January 2020.

Useful Links

For Harpal’s data science blogs:

https://www.drdatascience.co.uk/home

For info about MagicLab:

https://magiclab.co/about

For more info about working in data science:

https://www.prospects.ac.uk/jobs-and-work-experience/job-sectors/information-technology

https://www.prospects.ac.uk/job-profiles/data-scientist

https://www.york.ac.uk/students/work-volunteering-careers/ideas/sectors/it/

For more resources for York science students: 

https://www.york.ac.uk/students/work-volunteering-careers/student-groups/science-careers/

To hear more tech and data related podcast stories: 

Digital Marketing with Tasha McNaught:

https://yorkcareers.wordpress.com/2019/11/06/what-do-you-actually-do-episode-17-tasha-mcnaught-digital-marketer/

Market Research internship with Julia Hebron:

https://yorkcareers.wordpress.com/2019/05/15/what-do-you-actually-do-episode-12-julia-cass-hebron-intern-at-make-it-york/

Transcript:

0:00  

Hello and welcome to this episode of What Do You Actually Do? My name is Kate Morris, and I’ll be your host today. In today’s episode, we’ll be talking about working in the tech sector. And today we’re joined by Harpal Sahota who’s a data scientist at MagicLab. So Harpal, what do you actually do?

0:15  

Good question. So I am a data scientist at MagicLab like you said. So my job there is really to take the company’s data and derive insights from that which the company can then act upon, and to help drive user experience, improve revenue and things like that. That’s kind of my day to day job in a nutshell.

0:33  

So MagicLab, I had a look at their website. My understanding was they seem to develop apps for other people – is that right?

0:40  

So, they are in the dating sector.

0:42  

Yeah, I was wondering why there was lots of dating stuff on the website!

0:44  

So MagicLabs are a top level group. So beneath that are four apps. You’ve probably heard of two of them – Badoo & Bumble? 

0:51  

Oh, yeah. 

0:51  

So we also have Chappy & Lumen. So these apps can target different kind of markets and sectors or cultures as well, depending on people who use them.

1:01  

So why do they need the data then? What do they use it for?

1:04  

So we can help kind of match people together based on similar interests, but also we have a key priority of user safety. And so we kind of detect abusive or rude users by basically what they write in their profiles. We also can detect hate symbols and weapons in images as well, because we don’t users to see that, we don’t want to be affiliated with that as well, so we want to remove those images from profiles, essentially.

1:28  

So do you write software to detect those things? 

1:32  

So we use AI machine learning to develop models to help us detect these profiles and either give them a warning or ban them completely. 

1:40  

Okay, so what are the key elements of your role then if your objective is to kind of provide the company with this user data? What are you doing on a day to day basis?

1:54  

The company provides us with the data, and so we then take that data and train models to find signals in the data, which can identify rude profiles or non-rude profiles, essentially, I’ll flag them up. So basically, in that instance, a day to day role will be getting data, cleaning it because that is a big aspect of the role, and then training a model to help predict using the given data that we have, and we give it targets.

2:18  

And do you then have to present that data back to your employees? Do you have to think of ways to make it understandable?

2:25  

Employers always want an ROI, basically return on investment. And so if you can prove that and show we can add value to the company, then that’s a huge win for us. So that’s a key element of the role as well. They’re not interested in a model that doesn’t add value to the business, essentially. So we always have to prove that the model that we’re building is adding value to the business.

2:42  

Okay. So what was your starting point then, and where did your interest in data science come from?

2:53  

Computer science kind of built my programming skills up, and then that kind of got me into the data science field as well. That was kind of half of my foot in the door through that aspect, and then when I heard about it, I really liked what I heard, and decided I wanted to get into that field. So I tailored my career into getting into that field.

3:11  

So you did a PhD in Computational Biology. How has that impacted on your career? Was that with a view to get into data science in the future and you thought that would be useful, or did doing that PhD help you decide where you wanted to go?

3:24  

Unfortunately, a lot of roles require a PhD in data science, but that’s changing. Now people are showing that you don’t need a PhD to be a data scientist, but a lot of companies use it as a filter, simply because they get overwhelmed by applicants. But you don’t need a PhD. As long as you have good programming skills and good math skills as well, you have a good shot of become a data scientist, but for me back then, it was a conscious choice to get into data science because all the jobs required a PhD, essentially.

3:29  

Okay. So, you had your eyes on the prize

3:55  

Yes, from the beginning!

3:58  

You’ve mentioned you need to obviously have those analytical skills, you need to be able to explain the return on investment to your colleagues. What other skills or personal qualities would you say you need to have to be kind of not just successful, but happy as a data scientist?

4:18  

In terms of personality wise?

4:20  

Whatever you feel. What is a good fit for it?

4:25  

I think you need to be methodical, because if you make a mistake, that’s on you and you will be accountable and responsible, especially, for example, if you’re building a model that tries to detect cancer in patients. If your model is wrong, and you’re predicting the patient has cancer, that’s a lot of stress on the patient, right? So you need to kind of take responsibility and be methodical and be accurate and make sure you’re aware of what you’re doing, and know the ins and outs of everything you’re doing.

4:54  

So that real high attention to detail. But it sounds like integrity as well then because I guess you kind of feel like, ‘Oh, no, I’ve realised there’s this error, but actually, I can cover it up because I don’t want to start again.’

5:05  

Exactly. You can’t fudge the results, because you will get caught out eventually. When the model is used by the company and it doesn’t match what you’re saying it does, you’re gonna get caught out. So you need to have integrity there as well, you need to be honest. Honesty is the best policy.

5:21  

Especially in the dating game! Is it quite like a high pressure job, or is it something where you’re doing long hours, what’s the sort of lifestyle like?

5:34  

It’s a bit of both really. It can be quite a high pressure job because you’re dealing with very senior stakeholders and projects, sometimes you deal with board members as well. So you need to kind of have that integrity to say, ‘I disagree with you’ as well. Essentially, you need to be kind of egotistical but confident in what you’re saying and what you’re doing. So confidence is a key aspect as well. 

5:55  

And would you say you had that coming in because of your experiences on the PhD, or was that something that came more as you got into the role?

6:02  

I think the PhD definitely helped develop those skills, because in a PhD you’ve given presentations, people have tried to disagree with you about your work, but then you have to defend it as well. So you kind of develop those skills naturally by doing a PhD. So that definitely did help.

6:18  

And what about work experience wise, because you work at MagicLabs, but you’ve worked for a few different organisations. What was your sort of first job? How did you secure that?

6:29  

So once I a PhD, I decided to apply for some data science roles, wasn’t really getting anywhere because companies were inundated with applicants, so kind of went through a recruiter, and that helped me get my first job which was in recruitment.

6:45  

So it was a recruitment agency who specialised in data science stuff, but then you’re saying you ended up working for that recruitment agency?

6:53  

 No, no. So I worked for a company that helps recruiters.

6:56  

Ah, right. What would you say you really love about the job – what is the best bit of it and what’s the kind of the worst bit of it?

7:06  

The best bit is definitely building the models. Seeing the model and actually learning something from the data, showing that it can actually make predictions and accurate predictions. That’s the best bit because you know, your model is working and it shows that we can add value to the company, which proves your own worth as well. In a sense, I’d say that the worst part is sometimes getting the data because it can be very difficult. Sometimes there’s not enough of it. But the worst of the worst is cleaning of the data. It’s such a tedious and painful process. We spend around about 80% of the time cleaning data, and 20% of the time building models.

7:38  

So what does that mean cleaning the data?

7:41  

So for example, say in real estate, agents can upload a listing, but they might leave out the number of bedrooms, they may leave out what type of property it is – is it a house or a flat for example, so the data field is missing. You have to clean that data, either go to the original listing, try to work out if it’s a flat or a house or try to work out how many bedrooms it has, so you have to clean the data in that sense. So then you also have to make a conscious decision. Do you remove that data point or do you try to fill it up or kind of manipulate it as well?

8:08  

So there’s a lot of problem solving as well as then you’re having to research that information?

8:13  

So there’s a lot of little problems that come together to make up one big problem, essentially.

8:17  

And again, I guess in terms of temperament, not kind of freaking out when that happens, just being able to be flexible and adaptable. 

8:25  

Exactly, yeah. So you definitely have to be adaptable, things will go wrong. And you have to figure out why. And when they do go wrong, speak to the people who know best how to solve it. And if you can’t solve it, then try to figure out how you can. If you can’t then you need to move on and do something else.

8:39  

And you’ve got some blogs. Is it one or two blogs that you’ve got?

8:43  

I’ve got two blogs.

8:44  

So did you start those back in the day when you were thinking, ‘I want to go into data science and this is a useful thing to do’ or is that a more recent activity?

8:53  

That was back in the day to kind of put my foot in the door in the data science field because I knew I needed something that differentiated me from other applicants, and that was a big hit with companies because they seen that I was active in the field, and I was contributing as well. So that was a big hit with employers.

9:07  

But you still write on there don’t do?

9:09  

Yeah. 

9:10  

So is that because you just really enjoy the visualisations?

9:13  

Exactly. So, yeah, a key aspect of data science is also data visualisation. And that’s what I do on my blog. It’s also a way of showing data in a unique and different way, which I enjoy doing. I think that’s really important. Because otherwise you can just show bar graphs and it can get really boring.

9:32  

So it sounds like it’s a really interesting area to work in. It’s highly competitive. There’s lots of people going for it. Having a PhD was a standard – that might be shifting, but you’ve still got to differentiate. Yeah, so doing things like blogging, maybe trying to get some work experience. You’ve got to stand out from the crowd in some way.

9:49  

Absolutely. Like I would recommend attending meetups and networking is actually really important as well. The more meetups you go to, the more people you speak to, and they start to recognise you and they start talking about roles and they may offer you a role somewhere else, they may know someone who can help you out. This science field is very small, like if you know somebody, they will know somebody who knows somebody you need to speak to. And so networking is probably one of the key aspects there.

10:13  

So how would someone find out about these meetups then?

10:15  

Just go to meetup.com or .co.uk and search for data science meetups in your local area, and there’s bound to be plenty around.

10:22  

And anyone can go to those?

10:24  

They are all free to go to.

10:27  

All right, well we’ll put details of that on our website. And just thinking ahead for any students or recent grads who want to break into the sector. What do you think the key challenge will be for data science or the tech sector in general over the next few years?

10:43  

I think that what is really gonna be important in the future is data ethics. So, at the minute our models can be quite biased, in the sense that models can sometimes predict gender when they shouldn’t predict gender, so gender can be a big bias in our models and we try to remove gender from models. So data ethics is going to become really important in the future. So if you want to become a data scientist trying to learn about data ethics, it kind of helps future proof yourself as well. So I recommend learning about it.

11:13  

That sounds really interesting. And I guess with the kind of constant craziness and changes that are happening in the world all the time, again, that ethics and integrity, being able to sort of show that you’ve got those kind of qualities and that understanding could be a really good thing?

11:29  

Absolutely, one hundred percent. 

11:31  

For more info about the career areas we’ve mentioned today, I’m going to add some relevant links to the episode description and a link to the full transcript of today’s show. But Harpal, thank you so much for coming in and making the journey up here.

11:42  

You’re welcome. Thanks for the invite.

11:45

Thanks for joining us this week on What Do You Actually Do? This episode was hosted by myself, Kate Morris, and edited and produced by the Careers & Placements team. If you love this podcast, spread the word and subscribe. Are you eager to get more tips? Follow University of York Careers & Placements on YouTube, Twitter, Facebook and Instagram. All useful links are in this episode description. This has been produced at the University of York Careers & Placements. For more information, visit york.ac.uk/careers