This project extracts all human birth dates recorded in Wikidata, maps each date
to its position in the lunar cycle, and searches for any periodic signal using
distribution analysis and Fourier methods across two centuries of records.
Abstract
The belief that full moons trigger a higher number of births is among the most persistent
myths in popular culture and nursing folklore. We tested this claim using the largest
structured biographical dataset publicly available: the Wikidata knowledge graph,
which at the time of extraction contained approximately 110 million entity records.
After filtering for humans with day-precision birth dates, removing duplicated and
mislabelled records, and computing the lunar cycle position of each birth date using
high-precision astronomical ephemerides (pyephem), we obtained a final sample of
2,813,850 births spanning the period 1800–1999.
The distribution of births across the lunar cycle is statistically indistinguishable from
uniform across all four half-century sub-periods examined. A power spectral analysis of the
binned distribution reveals no significant signal at the fundamental lunar frequency
(f = 1). Apparent minor oscillations near f = 12
can be explained by a Wikidata data-quality artefact.
These results constitute the largest population-level test of the lunar birth
hypothesis to date and provide strong evidence against any association between
lunar phase and birth frequency.
Approach
Birth dates are extracted from the full Wikidata JSON dump via a multiprocessing
pipeline and stored as partitioned Parquet files. For each birth date, the lunar
cycle progress is computed using astronomical ephemerides from
pyephem — a value between 0 (new moon) and 1 (next new moon), passing
through 0.5 at full moon.
The resulting distribution of births across lunar phases is then analysed both
visually and with Fourier analysis to test whether any phase-specific elevation
persists across different historical periods.
What was tested
Record availability and temporal coverage by birth year
Global birth distribution across the full 29.5-day lunar cycle
Stability across half-century cohorts (1800–1999) to detect any era-specific pattern
Fourier frequency content of detrended birth-count histograms to probe for any weak but consistent periodic signal
Figures
Record availability by birth year
The bar chart shows the number of birth records per year extracted from Wikidata.
Coverage is strikingly uneven: before 1800 it is sparse, confined almost entirely
to historically prominent figures — monarchs, scientists, philosophers, military
commanders — whose dates of birth were preserved in written records. From the
mid-19th century onward, counts rise sharply as civil registration became
widespread across Europe and North America, and as more recent biographies
entered Wikidata in bulk.
This non-uniform sampling is a fundamental caveat for the entire analysis.
The dataset is not a random draw from all human births; it is a heavily
biased sample skewed toward notable individuals and toward the modern era.
Any lunar signal detected in the aggregate must therefore survive this
sampling structure to be considered meaningful.
Birth distribution across the lunar cycle — all years combined
The x-axis spans one complete lunar cycle: 0 corresponds to new moon,
0.5 to full moon, and 1 to the following new moon. Each bin counts the
number of births occurring when the Moon was at that phase, normalised
so that a perfectly uniform distribution would appear as a flat line.
Across the entire dataset, the histogram is strikingly flat. There is no
sustained elevation around full moon — the phase most commonly invoked in
folk claims about lunar influence — nor around any other particular phase.
Fluctuations between bins are small and consistent with statistical noise
expected in a large but irregularly sampled population.
If the Moon exerted a meaningful influence on birth timing, we would
expect a reproducible peak (or trough) at one or more phase values. The
absence of such a feature in the aggregate distribution is the first and
most direct negative result.
Birth distribution by half-century (1800–1999)
To rule out the possibility that a real signal is obscured by combining
very different populations, the data are split into four half-century
cohorts: 1800–1849, 1850–1899, 1900–1949, and 1950–1999. Each panel
shows the birth distribution across the lunar cycle independently for
that period.
The key diagnostic is consistency: a genuine lunar effect — biological,
gravitational, or cultural — would produce a similar pattern in every
cohort, since the Moon's cycle has not changed. What the figure shows
instead is that the small fluctuations in each panel occur at different
phases from cohort to cohort. A bump visible in one half-century is
absent or reversed in another, and no phase-specific feature recurs
reliably across all four windows.
This inconsistency across eras is strong evidence that the minor
deviations from flat in each panel reflect sampling noise and
record-coverage artefacts rather than any underlying lunar signal.
Frequency analysis: searching for a hidden periodic signal
Visual inspection of histograms can miss weak but consistent periodic
patterns. To apply a more sensitive test, a discrete Fourier transform
(FFT) was computed on the birth-count histogram for each half-century
cohort. The FFT decomposes the distribution into sinusoidal components
and returns the amplitude at each frequency — in this context, each
frequency corresponds to a hypothetical rhythmic pattern repeating
a given number of times per lunar cycle.
If births clustered around full moon and new moon equally, we would expect
a peak at frequency 2 (period = half a lunar cycle, ≈ 14.75 days).
If they clustered around only one phase, we would expect a peak at
frequency 1 (period = one full cycle, ≈ 29.5 days).
Neither of these signatures appeared consistently. While individual
cohorts show spectral peaks at various frequencies, those peaks are
located at different frequencies in different half-centuries and their
amplitudes are well within the range expected from random fluctuations
in a dataset of this size. No frequency stands out reproducibly across
all four time windows.
The Fourier test is particularly telling because it is sensitive to
even subtle, persistent periodicities that might not be visible to the
eye. The absence of a stable dominant frequency across cohorts means
that, if a lunar effect on birth timing exists at all, its magnitude
in this dataset is indistinguishable from noise.
⚠ Data-quality artefact at f ≈ 12: before quality filtering,
the power spectrum showed a spurious dominant peak near frequency 12 (period
≈ 2.5 days) across all cohorts. Investigation revealed systematic
date heaping in Wikidata: a significant fraction of records have
day = 01 entered as a placeholder by editors who knew only the birth
month and year, yet marked the claim as day-precision. Because calendar months
(≈30.44 days) and the lunar month (29.53 days) are incommensurable,
these 12 heaped dates map to 12 unevenly spaced positions in the lunar cycle,
generating the observed oscillation. More analysis are required to understand
carefully the presence of these artifacts. No analogous peak appears at f = 1.
Generated from MoonBirths outputs in output/. Run python main.py to refresh figures.