Statistics and data science

A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.

Big data in 1990s

Huber (1994); (1996)

Data Size Bytes Storage Mode
tiny \(10^2\) piece of paper
small \(10^4\) a few pieces of paper
medium \(10^6\) (MB) a floppy disk
large \(10^8\) hard disk
huge \(10^9\) (GB) hard disk(s)
massive \(10^{12}\) (TB) hard disk(s); RAID storage

Big data in 21st centry

Four V’s of big data:

Source: IBM.

A typical data scientist on Linkedin

Course desciption

References

Huber, P. J. (1994). Huge data sets. In COMPSTAT 1994 (Vienna) (pp. 3–13). Heidelberg: Physica.

Huber, P. J. (1996). Massive data sets workshop: The morning after. In Massive data sets: Proceedings of a workshop (pp. 169–184). Washington: National Academy Press.