Data Validation
We validate each survey that comes in using a combination of machine learning and human validation.
- Each survey passes through 25 hard coded validation points. If any survey fails any of these validation points, it is removed from the data set.
- Within each job title, surveys that are deemed to be outliers, based on the reported compensation, are also removed.
- PayScale uses a proprietary finger printing technology to detect and remove multiple submissions.
- PayScale tracks submissions trends to identify and remove any questionable data.
Incumbent and Employer Counts/Cuts:
Counts: The compensation model uses all available data within a job/country combination to produce specific pay range estimates. Although the model focuses more heavily on those profiles that most closely match the details of the position and cut, all profiles within the job/country combination are utilized. Therefore, the incumbent counts listed for each specific cut represents the number of profiles available in the job/country combination.
The number of different employers represented is at least the number shown here. As mentioned above, we do not require data providers to disclose their employer name so the number we report is the minimum number of employers included.
Cuts: PayScale's Crowd Cut Data will report multiple cuts (labor markets) for each position. Location and employer type are crucial when pricing a health care positions. We produced reports for more than 200 Metropolitan Statistical Areas (MSAs), which are regional areas defined by the United States Office of Management and Budget (OMB). You can learn more about MSAs here: https://www.census.gov/programs-surveys/metro-micro.html. The employer types found in each crowd cut are extensive and can be found in the corresponding file located in the Detailed Files section.
Data Reporting
We produce the following statistics for salary, bonus paid, and total cash compensation:
- Minimum Number of Companies
- Number of Incumbents
- 10th Percentile
- 25th Percentile
- 50th Percentile
- 75th Percentile
- 90th Percentile
- Average
Compensation Model
Compensation Model: PayScale employs a proprietary parametric Bayesian model for constructing pay ranges and estimates. This model produces pay ranges for individual positions conditional on the data provided. We model pieces of compensation both individually and at the aggregate level, so we have separate models at the job title/country level for base, bonus, and total cash compensation.
The model prioritizes both the most current and the most salient data, meaning recent profiles that most closely match the respondent’s compensable characteristics are factored more heavily in the creation of the conditional salary range. We assume a distribution from the double-Pareto log-normal family of distributions for compensation. This allows the data to follow an asymmetric bell curve that can have a variety of different shapes contingent on job title and location.