Methodology — Recense Docs

Design principles #

Recense favours conservative defaults that match published survey-research practice. Where two reasonable methods exist for the same problem, we pick the one that is most defensible to a sceptical reviewer.

Every method below cites the published reference it comes from. The full reference list lives in the engine's references document — search for the [REF-xxx] tag to find author, year, and notes.

Independence by default. Without evidence of association we assume none — the maximum-entropy choice (REF-jaynes1957).
Show the unweighted base alongside the weighted estimate. Weighting changes the point estimate; readers still need to see the actual sample.
Treat missing values explicitly. Never silently drop, never silently impute.

Significance testing #

Recense uses the test that matches your data type. The Significance pill on a table chooses based on the row and column variables; you can override it.

Chi-square: Categorical × categorical tables. The standard Pearson chi-square of association — appropriate when expected cell counts are ≥ 5.
Z-test (two proportions): Pairwise comparison of column percentages within a categorical-by-categorical table. Used for the column-letter annotations (A, B, C) you see in cells.
t-test: Numeric measure (mean, sum) compared across categorical groups. Welch's t-test by default — does not assume equal variances.
Bonferroni correction: Optional multiple-comparison correction for tables with many pairwise tests. Multiplies p-values by the number of comparisons; conservative.

Weighting #

Recense applies survey weights at the cell level — every measure (count, percentage, mean, index) recomputes against the weighted base. The unweighted base row stays visible for context.

For weighting design, the SPSS file's designated weight variable is used by default. You can override per-table from the Weight pill.

Standard weighted estimators — Horvitz–Thompson for totals, weighted ratio for percentages, weighted mean for scale variables.
Effective sample size shown alongside the weighted base. Large differences signal a high design effect.
Significance tests use the weighted estimate but adjust for the design effect approximately (Kish DEFF). Reported p-values reflect the actual sample, not the unweighted N.

Missing values #

SPSS missing-value definitions (discrete and range) import directly. CSV imports use sentinel detection (empty cell, common codes) confirmable in the review step.

Missing values are excluded from the base of any measure that references the variable — never silently coded as zero.
The base row reflects the actual respondents who answered the question, not the dataset N.
Multi-response variables: a respondent who answered any item contributes to the base; missing on a single item doesn't exclude them from the question.

Multiple-response questions #

Multi-response (multi-punch) variables are detected from question-group structure on import. A multi-response cell shows the proportion of the base that selected each option.

Column percentages can sum above 100% — by design.
NET aggregations combine selected items into a single row using union semantics (a respondent counts once even if they selected several items in the NET).
Significance testing on multi-response uses category-pair z-tests; chi-square is not applied across multi-response columns.

Derived dimensions and calculated cells #

Expressions (NET, TOP_BOX, OR/AND/NOT) operate on the underlying response codes, not on already-computed percentages. A NET of three items is the proportion of the base that selected at least one of them.

Formula cells operate on the displayed table values. They recompute when source tables change but do not look back through to raw data.

Reference list #

The full citation list with author, year, and notes lives in the engine's references document. Each tag below links a method back to the published source.

REF-groves2009: Groves et al. (2009). Survey Methodology. Foundational reference for design effects and weighting.
REF-deming-stephan1940: Deming & Stephan (1940). The original raking / IPF paper.
REF-csiszar1975: Csiszár (1975). I-projection theory — uniqueness and existence of weighted solutions.
REF-little-rubin2002: Little & Rubin (2002). Statistical Analysis with Missing Data.
REF-agresti2013: Agresti (2013). Categorical Data Analysis. Log-linear models, association measures.
REF-efron-tibshirani1994: Efron & Tibshirani (1994). Bootstrap methods for uncertainty.