Large publicly available public-health data sets tend to contain a very large number of variables, enabling many different analytical comparisons and increasing the risk of identifying chance correlations (false-positive findings) through data dredging or p-hacking.
high
descriptive
High-dimensional public-health data sets offer many opportunities for exploratory analyses, which can produce spurious associations if not properly controlled or validated.