Reliability & Validity

My interest in reliability and validity is mainly centered around their relevance to chronological studies. How can the reliability of a dating method be tested? How can the validity of the results be checked? A dive into the basics of the terms is a good place to begin answering those two questions.

Reliability and validity are similar to precision and accuracy. In some cases they are synonymous.


The four graphs above help to show what precision/reliability and accuracy/validity look like. The goal of a dating method is to get it as accurate/valid and precise/reliable as possible, as is seen in the top right target. If the results of a method appear as the bottom left target, that method needs refining.

Top right shows results that are all very close to the bullseye, if not on it entirely. This is ideal.

Top left shows results that are kind of hitting the mark but are still scattered. The precision here can be improved.

Bottom right shows results that are all very close to the same spot but are not anywhere close to the bullseye. The accuracy here can be improved.

Bottom left shows results that are all over the place. This shows low reliability for hitting the same mark each time and low accuracy in hitting the bullseye.



There are four types of reliability that I’ve heard of so far:

1 – Test-restest Reliability

2 – Parallel Forms Reliability

3 – Inter-rater Reliability

4 – Internal Consistency Reliability

Alternatively, I’ve heard of five types of validity:

1 – Face Validity

2 – Construct Validity

3 – Criterion-Related Validity

4 – Formative Validity

5 – Sampling Validity


Four Types of Reliability

1 – Test-retest Reliability: measures reliability by conducting the same test multiple times over a period of time.[1], [3]

Also known as: retest reliability.[3]

Synonyms: coefficient of stability; repeatability; reproducibility of test results.[4, p.6622]

2 – Parallel Forms Reliability:

3 – Inter-rater Reliability: measures the amount that two or more raters/observers agree in the assessment of the same object.[1], [2, p.1348]

Synonyms: concordance; inter-observer reliability; inter-rater agreement; scorer reliability.[2, p.1348]

4 – Internal Consistency Reliability: see[1]

4a – Average Inter-item Correlation: see[1]

4b – Split-half Reliability: see[1]

Five Types of Validity

1 – Face Validity: see[1]

2 – Construct Validity: see[1]

3 – Criterion-Related Validity: see[1]

4 – Formative Validity: see[1]

5 – Sampling Validity: see[1]

Annotated Bibliography

This annotated bibliography is based on the references used for this article. I only include the reference number so as to not repeat the text here and there. The references follow directly after this bibliography.

[1] – This reference is a webpage on the College of Humanities and Fine Arts’ Student Outcomes Assessment website. Phelan & Wren provided a simple and useful guide to the main types of reliability and validity. The main point of their paper was to give a brief overview into how to gage reliability and validity in the realm of academic assessment. The examples used for each type were usually about how students respond to test taking. In this way, I think they offered some interesting perspectives on how academic comprehension is tested. I think this is a good reference for anyone who is completely new to this area of scholarship because it is a short read and has a lot of useful information.

[2] – This reference is from a section in the Encyclopedia of Clinical Neuropsychology. It’s a really paragraph with only seven sentences that talks about inter-rater reliability. The first six sentences gave a general overview of what inter-rater reliability is and the final sentence gave three examples of how it is applied to the field of clinical neuropsychology. I found this reference useful not only for its summary but also because it included synonyms for the term “inter-rater reliability”.

[3] – This reference is a short webpage that concisely summarizes Test-retest Reliability.



[1] – Colin Phelan & Julie Wren. “EXPLORING RELIABILITY IN ACADEMIC ASSESSMENT” (2005?). Accessed 16 Apr. 2021.

[2] – Lange, R. T. (2011). Inter-rater Reliability. Encyclopedia of Clinical Neuropsychology, 1348–1348. doi:10.1007/978-0-387-79948-3_1203. Accessed 17 Apr. 2021.

[3] – Stephanie Glen. “Test-Retest Reliability / Repeatability” From Elementary Statistics for the rest of us! Accessed 19 Apr. 2021.

[4] – Vilagut, G. (2014). Test-Retest Reliability. Encyclopedia of Quality of Life and Well-Being Research, 6622–6625. doi:10.1007/978-94-007-0753-5_3001. Accessed 19 Apr. 2021.

[5] – Glossary for Reliability. Accessed 30 Apr. 2021.

Access exclusive Ctruth content:

Buy Ctruth merch:

Donate to Ctruth directly:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: