Data Scientists: Are You Getting Paid Enough?
Fact: Theatrical makeup artists (at $124k) earn more than data scientists (at $109k) in the US (BLS). My take: A lot of data scientists are being underpaid.
I covered data engineering salaries before, but there is also a treasure trove of salary data on Reddit in r/DataScience, which I wanted to dig deeper into.
In 2019, 2020 and 2021 a post ran that looked like this:
And a typical comment looks like the below, which has a well structured data format:
Which meant I'd be able to scrape the comment data and use it to build a nice table.
At this point, I should say that in a community full of data scientists, I'm not the first person to have this idea, and there are at least three other posts analyzing the data. However these scraped only a fraction of the total data, and I also thought there was a lot more insight to be had in the data if I got a bit creative.
Particularly I wanted to find out:
- How fast have data science salaries been increasing, given recent inflation?
- What's the best way to increase your salary, if you're willing to make active changes to do so?
The rough process I followed was:
- Extract the data from the Reddit comments
- Parse the data into a table so that I could easily analyze it
- Clean the data and tag it
- Analyze the data and present results
1. Extracting the data with DevTools
I used Chrome's DevTools to find the requests that sent back comment data. It took a bit of searching, but eventually I found it.

The requests sent back a data file in json format. For example:
{
"account": null,
"authorFlair": {...},
"commentLists": {...},
"comments": {
"t1_ghe6iex": {
...
"media": {
"richtextContent": {
"document": [
{"c": [
{"c": [{"c": [{"e": "text","t": "Title: Data Scientist","f": [[1,0,6]]}],"e": "par"}],"e": "li"},
{"c": [{"c": [{"e": "text","t": "Tenure length: 3yrs","f": [[1,0,14]]}],"e": "par"}],"e": "li"},
{"c": [{"c": [{"e": "text","t": "Location: Houston","f": [[1,0,9]]}],"e": "par"}],"e": "li"},
{"c": [{"c": [{"e": "text","t": "Salary: $140,000","f": [[1,0,7]]}],"e": "par"}],"e": "li"},
{"c": [{"c": [{"e": "text","t": "Company/Industry: Oil and Gas","f": [[1,0,17]]}],"e": "par"}],"e": "li"},
{"c": [{"c": [{"e": "text","t": "Education: Masters in Applied Statistics","f": [[1,0,10]]}],"e": "par"}],"e": "li"},
{"c": [{"c": [{"e": "text","t": "Prior Experience: 2yrs of actuarial experience","f": [[1,0,17]]}],"e": "par"}],"e": "li"},
{"c": [{"c": [{"e": "text","t": "Relocation/Signing Bonus: $15,000 signing bonus","f": [[1,0,25]]}],"e": "par"}],"e": "li"},
{"c": [{"c": [{"e": "text","t": "Stock and/or recurring bonuses: 15-30% bonus(no bonus this year of course due to Covid)","f": [[1,0,31]]}],"e": "par"}],"e": "li"},
{"c": [{"c": [{"e": "text","t": "Total comp: $140,000","f": [[1,0,11]]}],"e": "par"}],"e": "li"}],"e": "list","o": false},
{"c": [{"e": "text","t": "I'm about to accept a new job that will be include a nice paycut (125K) just to get out of O&G.The industry is on a downturn and I think now is a good time move on.The premium pay is no longer worth the instability."}],"e": "par"}
]
},
"type": "rtjson",
"rteMode": "richtext"
},
"profileImage": "https://styles.redditmedia.com/t5_mb2hi/styles/profileIcon_snoo1ac41e44-c7ed-4194-9f09-48672b506ee0-headshot.png?width=256&height=256&crop=256:256,smart&s=0e6131dfcf0c758d2c28fb08b8dbae7ebf688161"
},
// & MANY MORE COMMENTS
}
}
Reddit "lazy loads" - it doesn't show all the comments until you scroll down. So I scrolled until all the comments were loaded, and grabbed the data from all the requests. There were three requests with data per yearly thread: nine in all.
2. Parsing the data into a table with Python
Not every poster is kind enough to conform rigidly to the above format. Some didn't include all of the fields, or didn't break lines after each field:
This meant I needed a couple of different approaches to parse the data. So I opened a Jupyter notebook, and wrote a few lines of python to parse the json files.
import json
import pandas as pd
# run it for each post file
dates=['2021','2020','2019']
pages=['1','2','3']
array = []
for page in pages:
for date in dates:
with open(date + '_' + page + '_post.json', 'r') as f:
data = json.load(f)
comment_no = 0
for key in data:
if key == "comments":
for comment in data[key]:
row=[]
row.append(date)
row.append(page)
for i in range(0,11):
# if the data is in a bulleted list, this works
try:
value=data[key][comment]['media']["richtextContent"]["document"][0]['c'][i]['c'][0]['c'][0]['t']
# Strips out some optional fields, which otherwise disrupt the columns
if "Remote:" not in value:
if "Internship:" not in value:
if "Details:" not in value:
row.append(value)
except:
# if the data is in a list, but has a non-list sentence first. (Posters often add a preamble)
try:
value=data[key][comment]['media']["richtextContent"]["document"][1]['c'][i]['c'][0]['c'][0]['t']
if "Remote:" not in value:
if "Internship:" not in value:
if "Details:" not in value:
row.append(value)
except:
try:
# this works if the data is not in a list
value=data[key][comment]['media']["richtextContent"]["document"][i]['c'][0]['t']
if "Remote:" not in value:
if "Internship:" not in value:
if "Details:" not in value:
row.append(value)
except:
pass
# remove results with less than 6 lines - these tend to be comments that do not contain salary data (which have 8-10 lines)
if len(row)>5:
array.append(row)
comment_no += 1
df=pd.DataFrame(array)
df.columns=['date','page','title','tenure','location','salary','industry','education', 'prior_experience','signing_bonus','stock_or_bonus','total_comp', 'extra_col']
df.to_csv('salary_data.csv', index=False)
Having run this, I have a table of 311 rows of data. But it was a bit of a mess, with issues including:
- Duplicate rows
- Rows with non-salary data comments
- Rows with misaligned columns
On top of this, the columns mainly contain free-text. E.g. the salaries are in different formats, and different currencies.
Date | Title | Tenure | Location | Salary | Industry | Education | Prior Experience | Signing Bonus | Stock Or Bonus | Total Comp |
---|---|---|---|---|---|---|---|---|---|---|
1905 | Title: Data Analyst | Tenure length: Accepting in a couple of days | Location: London, UK | Salary: 50k GBP | Company/Industry: FinTech | Education: BSc Maths with Stats | Prior Experience: 2 years Data Analyst | Bonus: Up to 15%, typically 10% apparently | - | - |
1905 | Title: Senior Data Scientist | Tenure length: < 1 year at this position. Held 3 DS positions at 3 companies in 2 years. | Location: NYC / remote | Salary: 205k | Industry: Tech (FAANG-adjacent) | Education: BA Poli Sci | Prior Experience: 4 years Data Analyst, 2 years DS | Stock and/or recurring bonuses: $297k RSUs (publicly traded company) yearly | Total comp: $502k | - |
1905 | Yeah sure. | I had been promoted from Data Analyst to entry-level (L3) DS, then again from L3 to L4, at company 1. That took place over a period of just over 2 years. I then went to company 2 (FAANG), which involved a promo to L5. I did not like this FAANG company, so I jumped to company 3, also at L5, but with significantly higher comp. | Company 1 to company 2 wasn't a super fast jump - 2 years - and it involved a level change and a jump in prestige, so that one didn't raise any eyebrows. | You will probably be able to guess what company 2 is from this, but let's just say it was a company with some prominent ethical issues playing out very, very publicly. Jumping from this company was an easy narrative to sell, as I was jumping because of those issues specifically. | I think the best way to summarize this history is in two points: | - | - | - | - | - |
1905 | Title: Data Scientist, Analytics Intern | Location: New York City | Salary: $7700 per month | Company/Industry: FAANG | Education: Senior year in undergrad | Relocation/Signing Bonus: Free relocation, $300 to ship personal items, reimbursement for transportation and mental/physical health needs, health insurance, choice between corporate housing or stipend. | - | - | - | - |
1905 | Title: Lead Data Scientist | Tenure length: 1.5 years | Location: São Paulo, Brazil | Salary: $55k USD (310k BRL) | Company/Industry: Tech/O&G/Mining/IoT/Other pre-IPO spinoff (we are an AI/MLE consultancy, most clients are in O&G or Mining). | Education: BS Geological Engineering, MS Mechanical Engineering | Prior Experience: 2.5 years as a DS in oil exploration between startups and a F500 O&G company. | Stock and/or recurring bonuses: No idea, I have equity but the company is less than a year old*. | - | - |
1905 | Title: Analytics Engineering Manager | Tenure length: 1 year current role; 6 prior years along data analyst track, ending at Sr Data Analyst | Location: Pacific Northwest, USA (hybrid remote) | Salary: $150k | Company/Industry: SaaS | Education: BS Economics; BA Int’l Studies | Prior Experience: 4 years customer success | Relocation/Signing Bonus: None | Stock and/or recurring bonuses: 15% bonus; ~$70k annual RSUs. | Total comp: ~$240k |
1905 | Title: Data Scientist | Tenure length: 4 years at company (1 has DS) | Location: Montreal | Salary: 95k$ (CAD) | Company/Industry: Oil and Gas | Education: Bachelor in mechanical engineering (almost done Msc in software) | Prior Experience: None | Relocation/Signing Bonus: N/A | Stock and/or recurring bonuses: 10% | - |
1905 | Title: Data Scientist | Tenure length: 2 years | Location: SF/Bay Area | Salary: $187k + bonus | Company/Industry: Startup, tech. (I figure out and invent paths forward for new potentially impossible tech, so it's a bit different than standard business DS/DA type work.) | Education: None. I got in before the DS title was used in silicon valley. | Prior Experience: 11 years | $Coop: No. | Relocation/Signing Bonus: No, but they tend to do that here. | - |
1905 | Title: Senior Data Scientist/Applied Scientist | Tenure length: Offer | Location: NYC | Salary: 175k | Company/Industry: E-commerce | Education: BS, MS in Math/Stats | Prior Experience: 3 YOE | Stock and/or recurring bonuses: 10% target bonus, 400k/4 years | Total comp: 292k | - |
1905 | Title: VP of Data Science | Tenure length: 6 years: 1 @ VP, 2 @ director, 2 @ manager, 1 @ data scientist | Location: Boston Area. WFH optional. I go in 1-2 days/week. | Salary: $200k base, $40k bonus target | Company/Industry: Marketing agency, ~500 people | Education: PhD in STEM field. BA in Physics. | Prior Experience: Postdoc related to PhD, then Insight Data Science | Relocation/Signing Bonus: None | Stock: Equity bonus equivalent to about 10% of salary yearly | Total comp: ~$260k |
3. Data cleaning
I re-used some of my code from cleaning data engineering salary posts, but for some of the columns I had to do some extra work. The aim of the cleaning was:
- Extract continuous data columns (salary, tenure): extract the numbers, detect and standardize the units (USD, years of tenure)
- Group categorical data columns (title, location, industry, education): try to categorize the responses into a manageable number of groups
- Remove erroneous and duplicate rows
For most of the cleaning I used a pretty rule based approach.
E.g. if salary contains k, multiply by 1000, if salary contains EUR, multiply by the EUR-USD FX rate, etc.
However, there were two columns where categorizing was pretty hard: location, and industry. So I enlisted my friend AI.
3.1 Using OpenAI to clean location data
I began by using a rule-based approach to categorize countries, but it turns out almost all the data is from the US.
Instead I decided to compare the different regions in the US. But the raw data has a real mix of place hierarchies, which makes a rule based approach arduous.
Location |
---|
fully remote |
lcol midwest city |
southern, usa |
lcol midwest |
washington dc |
atlanta |
socal |
west coast |
karachi pakistan |
washington, dc area |
I'm not a ML engineer, so I wasn't going to write my own model. However, OpenAI has a classifier model (free account needed) I used for this. It's pretty remarkable - you just pass it some text, and it autocompletes it for you.
I passed it the following:
The following is a list of places in the US
lcol midwest city
southern, usa
midwest
washington dc
atlanta
socal
karachi pakistan
west coast
....
The following is a list of regions they fit into:
Midwest
Northeast
Southeast
Southwest
West coast
Unspecified
Non-US
lcol midwest city - Midwest;
You click the Submit button in the UI and voila, it autocompletes it for you based on the instructions you gave it:
lcol midwest city - Midwest;
southern, usa - Southeast;
midwest - Midwest;
washington dc - Northeast;
atlanta - Southeast;
socal - Southwest;
karachi pakistan - Non-US;
west coast - West coast;
...
Pretty cool given how little we tell it about the data. Above you can see it correctly classifies Karachi, Pakistan as Non-US.
I then used the output to map into the original data.
After all the cleaning, it's not perfect, but it's pretty good:
Date | Title | Tenure | Tenure Clean | Location | Us Region | Salary | Salary ($) | Industry | Industry Group | Education | Education Level |
---|---|---|---|---|---|---|---|---|---|---|---|
2,019 | senior data scientist | 2.5 | 2.50 | denver metro | West coast | $151,000 | $151k | internet/web tech | Tech | ms | Master's |
2,019 | biostatistician data scientist | 6 months | 0.50 | miami/ft. lauderdale | Southeast | 52.5k | $53k | clinical research organization | Healthcare | b.s. statistics | Bachelor's |
2,019 | data scientist | 2yrs | 2.00 | houston | Unspecified | $136,000 | $136k | oil and gas | Oil, gas & mining | masters in applied statistics | Master's |
2,019 | data scientist | 1 year | 1.00 | midwest | Midwest | 83,000 | $83k | fortune 100 | Manufacturing | ms statistics | Master's |
2,019 | data scientist | 1.5 years | 1.50 | bay area | West coast | $155k | $155k | fb | Big Tech (FAANG) | phd in engineering | PhD |
2,019 | data scientist | 2.5 yrs | 2.50 | irving, tx | Southwest | 130k | $130k | entertainment | Other industry | msc psychology, msc data science | Master's |
2,019 | data scientist | < 1 yr | 1.00 | nyc | Northeast | 165k | $165k | healthtech, reinforcement learning | Healthcare | bachelor's | Bachelor's |
2,019 | senior data scientist | been here 1 month | - | la | West coast | 160k | $160k | tech | Tech | bs engineering/ ms in ds | Master's |
2,019 | data scientist | 3 months in current role, 2 years as data analyst | 0.25 | st. louis | Midwest | ~$95,000 | $95k | healthcare | Healthcare | masters | Master's |
2,019 | data scientist | starting spring 2020 | - | seattle | West coast | 118,000 | $118k | big tech company | Tech | bs in cs, data science. completing ms in cs | Master's |
4. Analyze the data
This whole article is written using Evidence, including for the charts. It's a great alternative to BI tools for analyzing and presenting data when you want to add narrative (Disclosure: I work there).
4.1 Commenters are mostly highly educated and US-based
I started by exploring who our commenters were.
As may surprise no one, data scientists are pretty educated: Over 50% have either a Master's or a PhD.
Also, roughly 75% of the responses are from the US, with most from the West and Northeast.
4.2 Average data science salaries & experience
Histograms are generally a good fit for displaying continuous data.
The median data science salary is $115k. Dragged up by a few high values in the dataset, the mean salary is $120k.
The average data scientist in the dataset has 1.94 years of experience, with almost half of posts from those with 1 year of experience or less: This data set is a reasonably junior sample.
4.3 Data science salaries are increasing
I looked at the trend of median, 25th percentile and 75th percentile salaries over time.
The median, 25th and 75th percentiles salaries have all increased between the 2019 and 2021 threads:
- 25th percentile from $82k to $87k (+7%)
- Median salary from $110k to $120k (+9%)
- 75th percentile from $148k to $160k (+8%)
However, it is not a totally smooth trend (e.g. the 75th percentile in 2020 was lower than in 2019). Relatively small sample sizes might be causing noise here.
4.4 Gaining experience quickly boosts your comp
The most passive way to increase your salary would be to just keep working to gain experience. Let's look at how median salaries change with tenure:
In the first 5 years, salaries increase from $110k to $160k. After this, the sample size is much smaller, but it appears to flatten off.
4.5 Going back to school increases your salary
People often go back to college during an economic downturn, as there are less opportunities in the job market. But there's a debate about whether further degrees are really worth it. Is it really worth doing a Master's or PhD?
In data science at least, higher levels of education are correlated with higher salaries. Earning a Master's could net you +$15k salary on a Bachelor's, while upgrading a Master's to a PhD could be worth +$45k a year.
4.6 Relocating could get you pay-rise
Salaries across the US are different. Where is it most lucrative to work?
Unsurprisingly, the West (which includes the bay area) is the area with the highest salaries.
However, even without moving to the West coast you can change your salary significantly by relocating. Those in the Southeast could get a $20k raise if they relocated to the Northeast, Southwest or Midwest.
4.7 Changing industry may get you a bigger paycheck
Another way to increase your salary is to change jobs. But what kind of company should you target?
Perhaps unsurprisingly, landing a job at FAANG is a good way to increase your salary. After that, O&G, Tech and Healthcare are all good bets for higher salaries.
If you are working in consulting, manufacturing, retail or logistics - you might be able to get a $20-30k boost by changing industry.
Wrapping up: Top tips for a higher salary
In summary, from the data that's been posted on Reddit:
- Data science salaries have been going up at 7-9% per year If your salary hasn't kept pace, talk to your manager about it, or consider looking around for a new role.
- Theres a gulf in salaries between the different regions of the US: West coast salaries are almost double those in the Southeast.
- Education matters in data science: PhDs earn $60k more per year than those with Bachelor's only.
- Not all industries are equal: FAANG roles, other tech firms, O&G and Healthcare are the most lucrative.
I hope you found this useful! I certainly enjoyed exploring the data (cleaned version on GitHub). If there's anything else you'd like to see, let me know in the comments on Reddit!