Once again I would like to thank everyone for their patience and participation. A new year and a new set of complications but the analysis is done and the report is published. Since publishing last year’s report I have received a lot of feedback, public and private. Some of it has been great and led to some improvements this year. Some of it was valid critique that I found helpful. Some of it was less than polite. So this year, I wanted to address some of common questions, accusations, and comments up front. I apologize in advance for my frustration, I assure you it is only pointed at the worst of what has been directed at me. So without further ado:

  1. Is this report high quality research?
  2. Will you be publishing the raw data?
  3. Should I trust the results
  4. Do the results reflect your political bias?
  5. Do you really believe the numbers are accurate?
  6. Will you be doing this again in the future?

Is this report high quality research?

No. Not by a long shot and it may help if I provide a self-critique to explain, though I am not an expert on this. There are a number of characteristics that compose high quality research and this survey report fails most of them.

First, the author. High quality research usually involves multiple highly qualified researchers who have advanced degrees, training, expertise, experience in the field, and at least one with prior publication. This report has one author, me, a huge red flag. I have a BS in Technical Management which is unrelated to statistics, machine learning, economics, or compensation structures. I have no prior published research to establish credibility in the area. Additionally, there is no public evidence of my experience with any of these topics.

Second, peer review. High quality research is published to reputable peer reviewed journals where qualified experts other than the authors review the data and findings. This report was published on a private blog, a huge red flag.

Third, citations. High quality research cites high quality research and is cited by high quality research. Although this report mentions other studies it does not cite them and I am unaware of any high quality research citing my work as a source. Another huge red flag.

Fourth, methods. Although using machine learning in research to find patterns and correlations is becoming more prevalent, if I were performing this method manually most statisticians would refer to it as p-hacking or data dredging. That’s basically where you dig through large piles of data hunting for statistical significance. High quality research has a hypothesis and uses statistical methods to prove the hypothesis or the null-hypothesis. This is how randomized control trials (RCT) are performed.

Fifth, pyramid of evidence. The medical research community has a great resource known as the pyramid of evidence that effectively ranks the quality of evidence from different sources. Expert opinion for example is the lowest form of evidence. This report at best might qualify as a cross-sectional or cohort study. At worst, it’s an editorial. This report is descriptive and can NOT prove causation. This is also why I use uncertain phrasing such as “seems” or “appears”. This report is describing a point in time observation and is quite low on the pyramid of evidence. Higher level evidence includes RCTs where specific interventions are tested with a control group and a treatment group and Meta Analyses which aggregate results from many other studies.

Sixth, the author is part of the study. As a member of the community, I am a participant, a beneficiary, and a conductor of the study. Big red flag.

Seventh, tools. Rather than using tools commonly used in research, I built my own. That introduces potential for uncaught bugs and mistakes. See also, only one author. Red. Flag.

And more! There are likely other issues that someone more formally trained in research would catch. The important lesson here is learn to evaluate evidence and research. The worst thing you can do is place too much trust in someone or something that doesn’t warrant it.

Will you be publishing the raw data?

No. You may have noticed that I didn’t mention unavailability of raw data when discussing high quality research. That’s because it’s common enough that studies have been performed analyzing the prevalence and impact of unpublished and selectively published data in research. There’s also lots of editorials discussing whether or not you should. So it doesn’t seem that a failure to publish raw data is necessarily a red flag.

But no, I will not be publishing it or making it available to anyone. It will be deleted as soon as this is published. I committed to protecting people’s data and identities. This data can easily be used to identify people and I have no intention of putting that in anyone else’s hands. I understand this can be frustrating for some. Distrust in other surveys with unavailable data is what led me to start this survey in the first place. But I don’t believe the solution is making the data publicly available. I think the solution is more studies and more people attempting to validate the findings. The core of scientific discovery is replication of results.

Should I trust the results?

No. Nor do I expect you to trust me. I encourage you to examine the results. Examine the methods. Compare it with other sources and other research. If you find something wrong and have the evidence to back it up, I will be happy to amend my findings. If your inclination is to dismiss the findings without examination please don’t be surprised if I dismiss your opinion as quickly as you dismissed my months of work and effort.

Do the results reflect your political bias?

No. At least not my political bias when I started the first study. I entered the first survey with a modified-meritocracy mental model of the industry if not the economy overall. Many of the pay gap arguments that the survey challenged for example are arguments that I would have made at the start of the first survey. I figured a pay gap of some kind did exist but tended to attribute most of it to other factors. If anything, these two surveys in conjunction with reviewing high quality research on the subjects has changed my beliefs and mental models. I enter these surveys with curiosity to better understand the industry. My beliefs follow the data and evidence, not the other way around.

Do you really believe the numbers are accurate?

Some of the numbers are more accurate than others. I have high confidence in the numbers from the US. Other countries much less so. You have to understand, the published numbers are not an indication of what I believe. I am reporting what other people submitted, nothing more. When I report the quartiles, those are the actual quartiles of the survey responses. I’m not looking at the data, making magical speculations, and then putting the salary numbers I think are accurate. These numbers are as reported in the survey minus a couple that were identified and confirmed as fake. If I could not confirm a submission as fake, it stayed in the data set.

It’s also important to keep in mind that there are a number of factors that could skew the data. The population sample is basically controlled by my professional network reach, so there is likely bias in that data (such as over representation of US based workers). Since a survey is self reported data, there is also the potential for data entry errors, uncaught fake errors, and self report biases.

So no I don’t believe the numbers are accurate. I know they are to some degree inaccurate and I encourage using multiple sources to try to ascertain the truth. But inaccurate data does not mean wrong data nor does it mean irrelevant data and those distinctions should be clear.

Will you be doing this again in the future?

Probably not.

I do feel that the work is important but there are two main reasons I likely will not be doing this again. The first is that the responsibility for this should rest ultimately with employers. The second is that the negativity around this has dissuaded me from doing it again.

First, let’s address the responsibility. The reason this survey is even necessary is because employers continuously fail to establish and communicate clear guidelines on compensation and what their employees need to do to earn more. For two years, I have spent nights and weekends over many months collecting and pouring over data that most employers have readily available to them in their HR systems. This is particularly true in professional services and consulting where the value of every billable employee is roughly measurable. If employers really wanted to address or solve these questions, they could. And however much it may seem that I enjoy tilting at windmills, I do not.

Second, I’m exhausted from the negativity. In the aftermath of the first survey, I have had people reach out publicly and privately to accuse me of political bias, manipulating the data, and other malicious intentions. I’ve been called all manner of names and I have been ridiculed. To top it all off, this year I had to contend with trying to identify false submissions. I am not looking forward to the repeat after publishing this year’s survey and at the moment I am not interested in doing it again.

That said, I offer an open invitation to help anyone who would like to conduct their own survey. My tools are publicly available. My methods are documented. My help is freely available.