MoonBirths

Do births cluster around moon phases?

This project extracts all human birth dates recorded in Wikidata, maps each date to its position in the lunar cycle, and searches for any periodic signal using distribution analysis and Fourier methods across two centuries of records.

The belief that full moons trigger a higher number of births is among the most persistent myths in popular culture and nursing folklore. We tested this claim using the largest structured biographical dataset publicly available: the Wikidata knowledge graph, which at the time of extraction contained approximately 110 million entity records. After filtering for humans with day-precision birth dates, removing duplicated and mislabelled records, and computing the lunar cycle position of each birth date using high-precision astronomical ephemerides (pyephem), we obtained a final sample of 2,813,850 births spanning the period 1800–1999.

The distribution of births across the lunar cycle is statistically indistinguishable from uniform across all four half-century sub-periods examined. A power spectral analysis of the binned distribution reveals no significant signal at the fundamental lunar frequency (f = 1). Apparent minor oscillations near f = 12 can be explained by a Wikidata data-quality artefact. These results constitute the largest population-level test of the lunar birth hypothesis to date and provide strong evidence against any association between lunar phase and birth frequency.

Approach

Birth dates are extracted from the full Wikidata JSON dump via a multiprocessing pipeline and stored as partitioned Parquet files. For each birth date, the lunar cycle progress is computed using astronomical ephemerides from pyephem — a value between 0 (new moon) and 1 (next new moon), passing through 0.5 at full moon.

The resulting distribution of births across lunar phases is then analysed both visually and with Fourier analysis to test whether any phase-specific elevation persists across different historical periods.

What was tested

  • Record availability and temporal coverage by birth year
  • Global birth distribution across the full 29.5-day lunar cycle
  • Stability across half-century cohorts (1800–1999) to detect any era-specific pattern
  • Fourier frequency content of detrended birth-count histograms to probe for any weak but consistent periodic signal
Distribution of records by birth year

Record availability by birth year

The bar chart shows the number of birth records per year extracted from Wikidata. Coverage is strikingly uneven: before 1800 it is sparse, confined almost entirely to historically prominent figures — monarchs, scientists, philosophers, military commanders — whose dates of birth were preserved in written records. From the mid-19th century onward, counts rise sharply as civil registration became widespread across Europe and North America, and as more recent biographies entered Wikidata in bulk.

This non-uniform sampling is a fundamental caveat for the entire analysis. The dataset is not a random draw from all human births; it is a heavily biased sample skewed toward notable individuals and toward the modern era. Any lunar signal detected in the aggregate must therefore survive this sampling structure to be considered meaningful.

Birth distribution across the lunar cycle

Birth distribution across the lunar cycle — all years combined

The x-axis spans one complete lunar cycle: 0 corresponds to new moon, 0.5 to full moon, and 1 to the following new moon. Each bin counts the number of births occurring when the Moon was at that phase, normalised so that a perfectly uniform distribution would appear as a flat line.

Across the entire dataset, the histogram is strikingly flat. There is no sustained elevation around full moon — the phase most commonly invoked in folk claims about lunar influence — nor around any other particular phase. Fluctuations between bins are small and consistent with statistical noise expected in a large but irregularly sampled population.

If the Moon exerted a meaningful influence on birth timing, we would expect a reproducible peak (or trough) at one or more phase values. The absence of such a feature in the aggregate distribution is the first and most direct negative result.

Birth distributions split by half-century

Birth distribution by half-century (1800–1999)

To rule out the possibility that a real signal is obscured by combining very different populations, the data are split into four half-century cohorts: 1800–1849, 1850–1899, 1900–1949, and 1950–1999. Each panel shows the birth distribution across the lunar cycle independently for that period.

The key diagnostic is consistency: a genuine lunar effect — biological, gravitational, or cultural — would produce a similar pattern in every cohort, since the Moon's cycle has not changed. What the figure shows instead is that the small fluctuations in each panel occur at different phases from cohort to cohort. A bump visible in one half-century is absent or reversed in another, and no phase-specific feature recurs reliably across all four windows.

This inconsistency across eras is strong evidence that the minor deviations from flat in each panel reflect sampling noise and record-coverage artefacts rather than any underlying lunar signal.

Frequency analysis: searching for a hidden periodic signal

Visual inspection of histograms can miss weak but consistent periodic patterns. To apply a more sensitive test, a discrete Fourier transform (FFT) was computed on the birth-count histogram for each half-century cohort. The FFT decomposes the distribution into sinusoidal components and returns the amplitude at each frequency — in this context, each frequency corresponds to a hypothetical rhythmic pattern repeating a given number of times per lunar cycle.

If births clustered around full moon and new moon equally, we would expect a peak at frequency 2 (period = half a lunar cycle, ≈ 14.75 days). If they clustered around only one phase, we would expect a peak at frequency 1 (period = one full cycle, ≈ 29.5 days).

Neither of these signatures appeared consistently. While individual cohorts show spectral peaks at various frequencies, those peaks are located at different frequencies in different half-centuries and their amplitudes are well within the range expected from random fluctuations in a dataset of this size. No frequency stands out reproducibly across all four time windows.

The Fourier test is particularly telling because it is sensitive to even subtle, persistent periodicities that might not be visible to the eye. The absence of a stable dominant frequency across cohorts means that, if a lunar effect on birth timing exists at all, its magnitude in this dataset is indistinguishable from noise.

⚠ Data-quality artefact at f ≈ 12: before quality filtering, the power spectrum showed a spurious dominant peak near frequency 12 (period ≈ 2.5 days) across all cohorts. Investigation revealed systematic date heaping in Wikidata: a significant fraction of records have day = 01 entered as a placeholder by editors who knew only the birth month and year, yet marked the claim as day-precision. Because calendar months (≈30.44 days) and the lunar month (29.53 days) are incommensurable, these 12 heaped dates map to 12 unevenly spaced positions in the lunar cycle, generating the observed oscillation. More analysis are required to understand carefully the presence of these artifacts. No analogous peak appears at f = 1.