November 7, 2017

Simple statistics can be good enough

by King Abdullah University of Science and Technology

Study of the mismatch between spatial environmental data and a commonly used statistical analysis suggests simpler statistics are sufficient in many cases.

Environmental scientists and their statistician colleagues face a common dilemma: Do simpler statistical tests properly characterize a data set? And is it worth the effort to derive and apply statistical methods that are possibly better matched but more difficult to interpret? In most cases the path of least resistance wins, but the choice of a simple statistical basis can cast slight doubt on the validity of statistically derived study results.

KAUST researcher Marc Genton and his doctoral student Yuan Yan developed a framework to test exactly how inaccurate a mismatch between data and statistical analysis could be, and the results are surprising.

"Researchers tend to fit spatial data with a simple Gaussian model—the classic symmetric bell curve around the average value—even though data might have an asymmetric distribution with features that diverge from Gaussian," says Yan. "We investigated the effect of the 'non-Gaussianity' of data on statistical estimation and prediction under the wrong Gaussian assumption."

Gaussian distributions are generally intuitive, with an average value and standard deviations from the average that imply some narrow or broad distribution of data. They are widely applied and understood, both from a practitioner perspective and for nontechnical users. But, in many situations, particularly for environmental data, the distribution of data is skewed. Wind speed and rainfall, for example, cannot be less than zero, yet a Gaussian distribution with a small average value but extended distribution to higher values can have a tail at the lower end that extends to negative values—certainly wrong, but by how much?

One of the most important concepts in spatial statistical analyses is how strongly data influence each other when a certain distance apart, which is given by what is known as the covariance function. Genton and Yan set out to systematically study the effect of applying a Gaussian model to estimate the covariance function for non-Gaussian data.

"We developed a tailored simulation scheme to generate non-Gaussian spatial data with a given covariance structure," says Genton. "We showed through our simulation study that when spatial data are non-Gaussian, the Gaussian likelihood estimator of covariance parameters still performs better than an alternative weighted least-squares estimator for data that are not heavily skewed."

The finding suggests that the simple Gaussian model is in fact generally adequate for parameter estimation for spatial data in many cases, offering some comfort to spatial scientists about their choice of statistical approach.

More information: Yuan Yan et al. Gaussian likelihood inference on data from trans-Gaussian random fields with Matérn covariance function, Environmetrics (2017). DOI: 10.1002/env.2458

Provided by King Abdullah University of Science and Technology

Citation: Simple statistics can be good enough (2017, November 7) retrieved 26 April 2024 from https://phys.org/news/2017-11-simple-statistics-good.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Dataset size counts for better climate and environmental predictions

19 shares

Feedback to editors

Simple statistics can be good enough

Managing meandering waterways in a changing world

New dataset sheds light on relationship of far-red sun-induced chlorophyll fluorescence to canopy-level photosynthesis

How much trust do people have in different types of scientists?

Scientists say voluntary corporate emissions targets not enough to create real climate action

Barley plants fine-tune their root microbial communities through sugary secretions

A shortcut for drug discovery: Novel method predicts on a large scale how small molecules interact with proteins

Yeast study offers possible answer to why some species are generalists and others specialists

Cichlid fishes' curiosity promotes biodiversity: How exploratory behavior aids in ecological adaptation

Climate change could become the main driver of biodiversity decline by mid-century, analysis suggests

First-of-its-kind study shows that conservation actions are effective at halting and reversing biodiversity loss

Relevant PhysicsForums posts

Innumeracy in public media today

How to convert ft-lbs/sec to Newtons?

Spherical trig - sphere radius from 6 lengths

Clever Geometry in this Video

Correlation vs causality implied by a graph

Help with new calculator selection (button choices)

Dataset size counts for better climate and environmental predictions

Improving connections for spatial analysis

New statistical approach for environmental measurements lets the data determine how to model extreme events

New model predicts wind speeds more accurately with three months of data than others do with 12

Finding patterns in corrupted data: New model-fitting technique efficient even for data sets with hundreds of variables

Going to extremes to predict natural disasters

A periodic table of primes: Research team claims that prime numbers can be predicted

'I had such fun!', says winner of top math prize

Ice-ray patterns: A rediscovery of past design for the future

Paper offers a mathematical approach to modeling a random walker moving across a random landscape

How do neural networks learn? A mathematical formula explains how they detect relevant patterns

Mathematicians prove Pólya's conjecture for the eigenvalues of a disk, a 70-year-old math problem

Medical Xpress

Tech Xplore

Science X

Simple statistics can be good enough

Managing meandering waterways in a changing world

New dataset sheds light on relationship of far-red sun-induced chlorophyll fluorescence to canopy-level photosynthesis

How much trust do people have in different types of scientists?

Scientists say voluntary corporate emissions targets not enough to create real climate action

Barley plants fine-tune their root microbial communities through sugary secretions

A shortcut for drug discovery: Novel method predicts on a large scale how small molecules interact with proteins

Yeast study offers possible answer to why some species are generalists and others specialists

Cichlid fishes' curiosity promotes biodiversity: How exploratory behavior aids in ecological adaptation

Climate change could become the main driver of biodiversity decline by mid-century, analysis suggests

First-of-its-kind study shows that conservation actions are effective at halting and reversing biodiversity loss

Relevant PhysicsForums posts

Related Stories

Dataset size counts for better climate and environmental predictions

Improving connections for spatial analysis

New statistical approach for environmental measurements lets the data determine how to model extreme events

New model predicts wind speeds more accurately with three months of data than others do with 12

Finding patterns in corrupted data: New model-fitting technique efficient even for data sets with hundreds of variables

Going to extremes to predict natural disasters

Recommended for you

A periodic table of primes: Research team claims that prime numbers can be predicted

'I had such fun!', says winner of top math prize

Ice-ray patterns: A rediscovery of past design for the future

Paper offers a mathematical approach to modeling a random walker moving across a random landscape

How do neural networks learn? A mathematical formula explains how they detect relevant patterns

Mathematicians prove Pólya's conjecture for the eigenvalues of a disk, a 70-year-old math problem

Newsletter sign up

Donate and enjoy an ad-free experience