Data Scientists: Are You Getting Paid Enough?

Data science-ing the way to higher pay with crowdsourced salary data
Archie Wood Archie Wood  

Fact: Theatrical makeup artists (at $124k) earn more than data scientists (at $109k) in the US (BLS). My take: A lot of data scientists are being underpaid.

I covered data engineering salaries before, but there is also a treasure trove of salary data on Reddit in r/DataScience, which I wanted to dig deeper into.

In 2019, 2020 and 2021 a post ran that looked like this:

And a typical comment looks like the below, which has a well structured data format:

Which meant I'd be able to scrape the comment data and use it to build a nice table.

At this point, I should say that in a community full of data scientists, I'm not the first person to have this idea, and there are at least three other posts analyzing the data. However these scraped only a fraction of the total data, and I also thought there was a lot more insight to be had in the data if I got a bit creative.

Particularly I wanted to find out:

  • How fast have data science salaries been increasing, given recent inflation?
  • What's the best way to increase your salary, if you're willing to make active changes to do so?

The rough process I followed was:

  1. Extract the data from the Reddit comments
  2. Parse the data into a table so that I could easily analyze it
  3. Clean the data and tag it
  4. Analyze the data and present results

1. Extracting the data with DevTools

I used Chrome's DevTools to find the requests that sent back comment data. It took a bit of searching, but eventually I found it.

Finding Requests in Browser

The requests sent back a data file in json format. For example:

    "account": null,
    "authorFlair": {...},
    "commentLists": {...},
    "comments": {
        "t1_ghe6iex": {
            "media": {
                "richtextContent": {
                    "document": [
                        {"c": [
                            {"c": [{"c": [{"e": "text","t": "Title: Data Scientist","f": [[1,0,6]]}],"e": "par"}],"e": "li"},
                            {"c": [{"c": [{"e": "text","t": "Tenure length: 3yrs","f": [[1,0,14]]}],"e": "par"}],"e": "li"},
                            {"c": [{"c": [{"e": "text","t": "Location: Houston","f": [[1,0,9]]}],"e": "par"}],"e": "li"},
                            {"c": [{"c": [{"e": "text","t": "Salary: $140,000","f": [[1,0,7]]}],"e": "par"}],"e": "li"},
                            {"c": [{"c": [{"e": "text","t": "Company/Industry: Oil and Gas","f": [[1,0,17]]}],"e": "par"}],"e": "li"},
                            {"c": [{"c": [{"e": "text","t": "Education: Masters in Applied Statistics","f": [[1,0,10]]}],"e": "par"}],"e": "li"},
                            {"c": [{"c": [{"e": "text","t": "Prior Experience: 2yrs of actuarial experience","f": [[1,0,17]]}],"e": "par"}],"e": "li"},
                            {"c": [{"c": [{"e": "text","t": "Relocation/Signing Bonus: $15,000 signing bonus","f": [[1,0,25]]}],"e": "par"}],"e": "li"},
                            {"c": [{"c": [{"e": "text","t": "Stock and/or recurring bonuses: 15-30% bonus(no bonus this year of course due to Covid)","f": [[1,0,31]]}],"e": "par"}],"e": "li"},
                            {"c": [{"c": [{"e": "text","t": "Total comp: $140,000","f": [[1,0,11]]}],"e": "par"}],"e": "li"}],"e": "list","o": false},
                            {"c": [{"e": "text","t": "I'm about to accept a new job that will be include a nice paycut (125K) just to get out of O&G.The industry is on a downturn and I think now is a good time move on.The premium pay is no longer worth the instability."}],"e": "par"}
                "type": "rtjson",
                "rteMode": "richtext"
            "profileImage": ",smart&s=0e6131dfcf0c758d2c28fb08b8dbae7ebf688161"

Reddit "lazy loads" - it doesn't show all the comments until you scroll down. So I scrolled until all the comments were loaded, and grabbed the data from all the requests. There were three requests with data per yearly thread: nine in all.

2. Parsing the data into a table with Python

Not every poster is kind enough to conform rigidly to the above format. Some didn't include all of the fields, or didn't break lines after each field:

This meant I needed a couple of different approaches to parse the data. So I opened a Jupyter notebook, and wrote a few lines of python to parse the json files.

Having run this, I have a table of 311 rows of data. But it was a bit of a mess, with issues including:

  • Duplicate rows
  • Rows with non-salary data comments
  • Rows with misaligned columns

On top of this, the columns mainly contain free-text. E.g. the salaries are in different formats, and different currencies.

Raw salary data
date title tenure location salary industry education prior_experience signing_bonus stock_or_bonus total_comp
1 2,021
Title: Data Analyst
Tenure length: Accepting in a couple of days
Location: London, UK
Salary: 50k GBP
Company/Industry: FinTech
Education: BSc Maths with Stats
Prior Experience: 2 years Data Analyst
Bonus: Up to 15%, typically 10% apparently
2 2,021
Title: Senior Data Scientist
Tenure length: < 1 year at this position. Held 3 DS positions at 3 companies in 2 years.
Location: NYC / remote
Salary: 205k
Industry: Tech (FAANG-adjacent)
Education: BA Poli Sci
Prior Experience: 4 years Data Analyst, 2 years DS
Stock and/or recurring bonuses: $297k RSUs (publicly traded company) yearly
Total comp: $502k
3 2,021
Yeah sure.
I had been promoted from Data Analyst to entry-level (L3) DS, then again from L3 to L4, at company 1. That took place over a period of just over 2 years. I then went to company 2 (FAANG), which involved a promo to L5. I did not like this FAANG company, so I jumped to company 3, also at L5, but with significantly higher comp.
Company 1 to company 2 wasn't a super fast jump - 2 years - and it involved a level change and a jump in prestige, so that one didn't raise any eyebrows.
You will probably be able to guess what company 2 is from this, but let's just say it was a company with some prominent ethical issues playing out very, very publicly. Jumping from this company was an easy narrative to sell, as I was jumping because of those issues specifically.
I think the best way to summarize this history is in two points:
4 2,021
Title: Data Scientist, Analytics Intern
Location: New York City
Salary: $7700 per month
Company/Industry: FAANG
Education: Senior year in undergrad
Relocation/Signing Bonus: Free relocation, $300 to ship personal items, reimbursement for transportation and mental/physical health needs, health insurance, choice between corporate housing or stipend.
5 2,021
Title: Lead Data Scientist
Tenure length: 1.5 years
Location: São Paulo, Brazil
Salary: $55k USD (310k BRL)
Company/Industry: Tech/O&G/Mining/IoT/Other pre-IPO spinoff (we are an AI/MLE consultancy, most clients are in O&G or Mining).
Education: BS Geological Engineering, MS Mechanical Engineering
Prior Experience: 2.5 years as a DS in oil exploration between startups and a F500 O&G company.
Stock and/or recurring bonuses: No idea, I have equity but the company is less than a year old*.
6 2,021
Title: Analytics Engineering Manager
Tenure length: 1 year current role; 6 prior years along data analyst track, ending at Sr Data Analyst
Location: Pacific Northwest, USA (hybrid remote)
Salary: $150k
Company/Industry: SaaS
Education: BS Economics; BA Int’l Studies
Prior Experience: 4 years customer success
Relocation/Signing Bonus: None
Stock and/or recurring bonuses: 15% bonus; ~$70k annual RSUs.
Total comp: ~$240k
7 2,021
Title: Data Scientist
Tenure length: 4 years at company (1 has DS)
Location: Montreal
Salary: 95k$ (CAD)
Company/Industry: Oil and Gas
Education: Bachelor in mechanical engineering (almost done Msc in software)
Prior Experience: None
Relocation/Signing Bonus: N/A
Stock and/or recurring bonuses: 10%
8 2,021
Title: Data Scientist
Tenure length: 2 years
Location: SF/Bay Area
Salary: $187k + bonus
Company/Industry: Startup, tech. (I figure out and invent paths forward for new potentially impossible tech, so it's a bit different than standard business DS/DA type work.)
Education: None. I got in before the DS title was used in silicon valley.
Prior Experience: 11 years
$Coop: No.
Relocation/Signing Bonus: No, but they tend to do that here.
9 2,021
Title: Senior Data Scientist/Applied Scientist
Tenure length: Offer
Location: NYC
Salary: 175k
Company/Industry: E-commerce
Education: BS, MS in Math/Stats
Prior Experience: 3 YOE
Stock and/or recurring bonuses: 10% target bonus, 400k/4 years
Total comp: 292k
10 2,021
Title: VP of Data Science
Tenure length: 6 years: 1 @ VP, 2 @ director, 2 @ manager, 1 @ data scientist
Location: Boston Area. WFH optional. I go in 1-2 days/week.
Salary: $200k base, $40k bonus target
Company/Industry: Marketing agency, ~500 people
Education: PhD in STEM field. BA in Physics.
Prior Experience: Postdoc related to PhD, then Insight Data Science
Relocation/Signing Bonus: None
Stock: Equity bonus equivalent to about 10% of salary yearly
Total comp: ~$260k

3. Data cleaning

I re-used some of my code from cleaning data engineering salary posts, but for some of the columns I had to do some extra work. The aim of the cleaning was:

  • Extract continuous data columns (salary, tenure): extract the numbers, detect and standardize the units (USD, years of tenure)
  • Group categorical data columns (title, location, industry, education): try to categorize the responses into a manageable number of groups
  • Remove erroneous and duplicate rows

For most of the cleaning I used a pretty rule based approach.

E.g. if salary contains k, multiply by 1000, if salary contains EUR, multiply by the EUR-USD FX rate, etc.

However, there were two columns where categorizing was pretty hard: location, and industry. So I enlisted my friend AI.

3.1 Using OpenAI to clean location data

I began by using a rule-based approach to categorize countries, but it turns out almost all the data is from the US.

Instead I decided to compare the different regions in the US. But the raw data has a real mix of place hierarchies, which makes a rule based approach arduous.

fully remote
lcol midwest city
southern, usa
lcol midwest
washington dc
west coast
karachi pakistan
washington, dc area

I'm not a ML engineer, so I wasn't going to write my own model. However, OpenAI has a classifier model (free account needed) I used for this. It's pretty remarkable - you just pass it some text, and it autocompletes it for you.

I passed it the following:

The following is a list of places in the US

lcol midwest city
southern, usa
washington dc
karachi pakistan
west coast

The following is a list of regions they fit into:

West coast

lcol midwest city - Midwest;
Input into OpenAI Classification model

You click the Submit button in the UI and voila, it autocompletes it for you based on the instructions you gave it:

lcol midwest city - Midwest;
southern, usa - Southeast;
midwest - Midwest;
washington dc - Northeast;
atlanta - Southeast;
socal - Southwest;
karachi pakistan - Non-US;
west coast - West coast;
Output from model

Pretty cool given how little we tell it about the data. Above you can see it correctly classifies Karachi, Pakistan as Non-US.

I then used the output to map into the original data.

After all the cleaning, it's not perfect, but it's pretty good:

date title tenure tenure_clean location us_region salary salary_usd industry industry_group education education_level
1 2,019
senior data scientist
denver metro
West coast
internet/web tech
2 2,019
biostatistician data scientist
6 months
miami/ft. lauderdale
clinical research organization
b.s. statistics
3 2,019
data scientist
oil and gas
Oil, gas & mining
masters in applied statistics
4 2,019
data scientist
1 year
fortune 100
ms statistics
5 2,019
data scientist
1.5 years
bay area
West coast
Big Tech (FAANG)
phd in engineering
6 2,019
data scientist
2.5 yrs
irving, tx
Other industry
msc psychology, msc data science
7 2,019
data scientist
< 1 yr
healthtech, reinforcement learning
8 2,019
senior data scientist
been here 1 month
West coast
bs engineering/ ms in ds
9 2,019
data scientist
3 months in current role, 2 years as data analyst
st. louis
10 2,019
data scientist
starting spring 2020
West coast
big tech company
bs in cs, data science. completing ms in cs

4. Analyze the data

This whole article is written using Evidence, including for the charts. It's a great alternative to BI tools for analyzing and presenting data when you want to add narrative (Disclosure: I work there).

4.1 Commenters are mostly highly educated and US-based

I started by exploring who our commenters were.


As may surprise no one, data scientists are pretty educated: Over 50% have either a Master's or a PhD.

Also, roughly 75% of the responses are from the US, with most from the West and Northeast.

4.2 Average data science salaries & experience

Histograms are generally a good fit for displaying continuous data.


The median data science salary is $115k. Dragged up by a few high values in the dataset, the mean salary is $120k.

The average data scientist in the dataset has 1.94 years of experience, with almost half of posts from those with 1 year of experience or less: This data set is a reasonably junior sample.


4.3 Data science salaries are increasing

I looked at the trend of median, 25th percentile and 75th percentile salaries over time.


The median, 25th and 75th percentiles salaries have all increased between the 2019 and 2021 threads:

  • 25th percentile from $81,500 to $87,379 (+7%)
  • Median salary from $110,000 to $120,000 (+9%)
  • 75th percentile from $148,000 to $160,000 (+8%)

However, it is not a totally smooth trend (e.g. the 75th percentile in 2020 was lower than in 2019). Relatively small sample sizes might be causing noise here.

4.4 Gaining experience quickly boosts your comp

The most passive way to increase your salary would be to just keep working to gain experience. Let's look at how median salaries change with tenure:


In the first 5 years, salaries increase from $110,000 to $160,000. After this, the sample size is much smaller, but it appears to flatten off.

4.5 Going back to school increases your salary

People often go back to college during an economic downturn, as there are less opportunities in the job market. But there's a debate about whether further degrees are really worth it. Is it really worth doing a Master's or PhD?

Note: While the "High School" salary is just below "Bachelor's", there were few (11) comments with below college education.

In data science at least, higher levels of education are correlated with higher salaries. Earning a Master's could net you +$15k salary on a Bachelor's, while upgrading a Master's to a PhD could be worth +$45k a year.

4.6 Relocating could get you pay-rise

Salaries across the US are different. Where is it most lucrative to work?

Salary by US region
US Region Salaries
Note: See Region definitions

Unsurprisingly, the West (which includes the bay area) is the area with the highest salaries.

However, even without moving to the West coast you can change your salary significantly by relocating. Those in the Southeast could get a $20k raise if they relocated to the Northeast, Southwest or Midwest.

4.7 Changing industry may get you a bigger paycheck

Another way to increase your salary is to change jobs. But what kind of company should you target?


Perhaps unsurprisingly, landing a job at FAANG is a good way to increase your salary. After that, O&G, Tech and Healthcare are all good bets for higher salaries.

If you are working in consulting, manufacturing, retail or logistics - you might be able to get a $20-30k boost by changing industry.

Wrapping up: Top tips for a higher salary

In summary, from the data that's been posted on Reddit:

  • Data science salaries have been going up at 7-9% per year If your salary hasn't kept pace, talk to your manager about it, or consider looking around for a new role.
  • Theres a gulf in salaries between the different regions of the US: West coast salaries are almost double those in the Southeast.
  • Education matters in data science: PhDs earn $60k more per year than those with Bachelor's only.
  • Not all industries are equal: FAANG roles, other tech firms, O&G and Healthcare are the most lucrative.

I hope you found this useful! I certainly enjoyed exploring the data (cleaned version on GitHub). If there's anything else you'd like to see, let me know in the comments on Reddit!

Powered by