This vignette explains the mathematical foundations of the Net Energy Return (Nh) methodology for analyzing household energy burden. We’ll show why this approach is both theoretically sound and computationally advantageous for aggregation across households.
Energy burden is defined as the ratio of energy spending to gross income:
Energy Burden (EB) = S / G
Where: - S = Total energy spending (electricity + gas + other fuels) - G = Gross household income
While this definition is straightforward for a single household, aggregating energy burden across multiple households is not trivial.
Consider three households:
# Three households with different income/spending patterns
households <- data.frame(
id = 1:3,
income = c(30000, 50000, 100000),
spending = c(3000, 3500, 4000)
)
households$eb <- energy_burden_func(households$income, households$spending)
print(households)
#> id income spending eb
#> 1 1 3e+04 3000 0.10
#> 2 2 5e+04 3500 0.07
#> 3 3 1e+05 4000 0.04What is the “average” energy burden across these three households?
# Attempt 1: Simple arithmetic mean (WRONG!)
mean_eb_wrong <- mean(households$eb)
print(paste("Simple mean EB:", scales::percent(mean_eb_wrong)))
#> [1] "Simple mean EB: 7%"This is incorrect because it treats all households equally, ignoring that they represent different amounts of total income and spending.
# Attempt 2: Weighted arithmetic mean (STILL WRONG!)
# Let's weight by number of households (all equal here, but principle matters)
weights <- c(100, 150, 200) # Different household counts
eb_arithmetic_mean <- weighted.mean(households$eb, weights)
print(paste("Weighted arithmetic mean EB:", scales::percent(eb_arithmetic_mean)))
#> [1] "Weighted arithmetic mean EB: 6%"Why is this still wrong? Energy burden is a ratio, and ratios cannot be aggregated using arithmetic means. The correct approach requires the harmonic mean.
For ratios like energy burden, the mathematically correct aggregation uses the harmonic mean:
# Correct aggregation: Weighted harmonic mean
eb_harmonic <- 1 / weighted.mean(1 / households$eb, weights)
print(paste("Weighted harmonic mean EB:", scales::percent(eb_harmonic)))
#> [1] "Weighted harmonic mean EB: 6%"
# Verify by calculating from totals
total_spending <- sum(households$spending * weights)
total_income <- sum(households$income * weights)
eb_from_totals <- total_spending / total_income
print(paste("EB from totals:", scales::percent(eb_from_totals)))
#> [1] "EB from totals: 5%"
# These should be identical
print(paste("Difference:", abs(eb_harmonic - eb_from_totals)))
#> [1] "Difference: 0.00198446937014668"The harmonic mean gives the same result as calculating energy burden from the aggregated totals, which is what we want!
While the harmonic mean is correct, it has some practical drawbacks: 1. Computational complexity: Requires division by each individual EB value 2. Numerical instability: Very small EB values (e.g., 0.01) become very large (100) after inversion 3. Less intuitive: Harmonic means are less familiar to most analysts 4. Error-prone: Easy to accidentally use arithmetic mean instead
The Net Energy Return (Nh) transformation solves these issues.
Net Energy Return is defined as:
Nh = (G - S) / S
This represents “net energy return per unit of energy spending” - similar to concepts in biophysical economics.
Key insight: Nh can be aggregated using simple arithmetic mean, then converted back to energy burden!
Let’s verify the mathematical relationship:
# Starting from Nh = (G - S) / S
# Let's solve for EB = S / G
# Nh = (G - S) / S
# Nh = G/S - S/S
# Nh = G/S - 1
# Nh + 1 = G/S
# 1 / (Nh + 1) = S/G = EB
# Therefore: EB = 1 / (Nh + 1)
# Verify with our example
households$nh <- ner_func(households$income, households$spending)
households$eb_from_nh <- 1 / (households$nh + 1)
# Compare to original EB
households$identical <- all.equal(households$eb, households$eb_from_nh)
print(households[, c("id", "eb", "eb_from_nh", "nh")])
#> id eb eb_from_nh nh
#> 1 1 0.10 0.10 9.00000
#> 2 2 0.07 0.07 13.28571
#> 3 3 0.04 0.04 24.00000Now we can aggregate using simple arithmetic mean:
# Step 1: Calculate Nh for each household
nh_values <- ner_func(households$income, households$spending)
# Step 2: Arithmetic weighted mean (simple!)
nh_mean <- weighted.mean(nh_values, weights)
print(paste("Weighted mean Nh:", round(nh_mean, 2)))
#> [1] "Weighted mean Nh: 17.1"
# Step 3: Convert back to EB
eb_from_nh <- 1 / (nh_mean + 1)
print(paste("EB from Nh method:", scales::percent(eb_from_nh)))
#> [1] "EB from Nh method: 6%"
# Compare to harmonic mean result
print(paste("EB from harmonic mean:", scales::percent(eb_harmonic)))
#> [1] "EB from harmonic mean: 6%"
print(paste("Difference:", abs(eb_from_nh - eb_harmonic)))
#> [1] "Difference: 0"Result: Both methods give identical results, but the Nh method uses simple arithmetic mean instead of harmonic mean!
The key mathematical insight is that the Nh transformation linearizes the aggregation problem.
For a group of households with weights \(w_i\), incomes \(G_i\), and spending \(S_i\):
Harmonic mean approach: \[\text{EB}_{\text{agg}} = \frac{1}{\sum_i w_i \cdot \frac{1}{\text{EB}_i}} = \frac{1}{\sum_i w_i \cdot \frac{G_i}{S_i}}\]
Nh approach: \[\text{Nh}_{\text{mean}} = \sum_i w_i \cdot \text{Nh}_i = \sum_i w_i \cdot \frac{G_i - S_i}{S_i}\]
\[\text{EB}_{\text{agg}} = \frac{1}{1 + \text{Nh}_{\text{mean}}}\]
Both formulations are mathematically equivalent when computing from the same underlying data.
The Nh method provides several advantages when aggregating across multiple households:
# Simulate larger dataset
set.seed(42)
n <- 10000
large_data <- data.frame(
income = rlnorm(n, meanlog = 10.8, sdlog = 0.8), # Log-normal income distribution
spending = NA
)
large_data$spending <- pmin(
rlnorm(n, meanlog = 8.2, sdlog = 0.5), # Log-normal spending
large_data$income * 0.5 # Cap at 50% of income
)
weights <- sample(50:500, n, replace = TRUE)
# Method 1: Nh with arithmetic mean
system.time({
nh <- ner_func(large_data$income, large_data$spending)
nh_mean <- weighted.mean(nh, weights)
eb_nh <- 1 / (1 + nh_mean)
})
#> user system elapsed
#> 0.000 0.000 0.001
# Method 2: Harmonic mean
system.time({
eb_direct <- energy_burden_func(large_data$income, large_data$spending)
eb_harmonic <- 1 / weighted.mean(1 / eb_direct, weights)
})
#> user system elapsed
#> 0.000 0.000 0.001
# Verify results are identical
print(paste("EB via Nh:", scales::percent(eb_nh)))
#> [1] "EB via Nh: 5%"
print(paste("EB via harmonic mean:", scales::percent(eb_harmonic)))
#> [1] "EB via harmonic mean: 5%"
print(paste("Difference:", abs(eb_nh - eb_harmonic)))
#> [1] "Difference: 0"Note: The computational advantage applies specifically to aggregation across households. For single household calculations, both methods are equivalent (EB = S/G = 1/(1+Nh)) and require the same basic operations.
The Nh method is also more numerically stable:
# Households with very low energy burden
low_burden <- data.frame(
income = c(200000, 500000, 1000000),
spending = c(2000, 3000, 5000) # Very low spending relative to income
)
low_burden$eb <- energy_burden_func(low_burden$income, low_burden$spending)
low_burden$nh <- ner_func(low_burden$income, low_burden$spending)
print("Energy Burden (direct):")
#> [1] "Energy Burden (direct):"
print(low_burden$eb)
#> [1] 0.010 0.006 0.005
print("\nReciprocal of EB (used in harmonic mean):")
#> [1] "\nReciprocal of EB (used in harmonic mean):"
print(1 / low_burden$eb) # Very large numbers!
#> [1] 100.0000 166.6667 200.0000
print("\nNet Energy Return (Nh):")
#> [1] "\nNet Energy Return (Nh):"
print(low_burden$nh) # More reasonable range
#> [1] 99.0000 165.6667 199.0000Very low energy burdens (e.g., 0.01 = 1%) become very large values (100) when inverted for harmonic mean calculation. The Nh values remain in a more reasonable range, reducing numerical precision issues.
Let’s quantify the error introduced by incorrectly using arithmetic mean on EB values:
# Use realistic North Carolina-like data
set.seed(123)
n_households <- 5000
realistic_data <- data.frame(
income_bracket = sample(
c("0-30% AMI", "30-50% AMI", "50-80% AMI", "80-100% AMI", "100%+ AMI"),
n_households,
replace = TRUE,
prob = c(0.15, 0.12, 0.20, 0.10, 0.43)
),
income = rlnorm(n_households, meanlog = 10.8, sdlog = 0.8),
households = sample(10:100, n_households, replace = TRUE)
)
realistic_data$spending <- realistic_data$income * rlnorm(
n_households,
meanlog = log(0.05),
sdlog = 0.6
)
# Calculate metrics
realistic_data <- realistic_data %>%
mutate(
eb = energy_burden_func(income, spending),
nh = ner_func(income, spending),
neb = neb_func(income, spending)
)
# WRONG: Arithmetic mean of EB
eb_wrong <- weighted.mean(realistic_data$eb, realistic_data$households)
# CORRECT: Via Nh
nh_mean <- weighted.mean(realistic_data$nh, realistic_data$households)
eb_correct <- 1 / (1 + nh_mean)
# Calculate error
absolute_error <- eb_wrong - eb_correct
relative_error_pct <- (absolute_error / eb_correct) * 100
cat(sprintf("WRONG (arithmetic mean EB): %.2f%%\n", eb_wrong * 100))
#> WRONG (arithmetic mean EB): 5.94%
cat(sprintf("CORRECT (via Nh method): %.2f%%\n", eb_correct * 100))
#> CORRECT (via Nh method): 4.15%
cat(sprintf("Absolute error: %.4f\n", absolute_error))
#> Absolute error: 0.0179
cat(sprintf("Relative error: %.2f%%\n", relative_error_pct))
#> Relative error: 43.18%The error from using arithmetic mean instead of proper aggregation is typically 1-5% in relative terms, which can be significant for policy analysis and equity assessments.
Here’s the recommended workflow for energy burden analysis:
# Step 1: Calculate Nh for all observations
data_with_metrics <- realistic_data %>%
mutate(
nh = ner_func(income, spending),
neb = neb_func(income, spending) # Same as eb, but emphasizes proper aggregation
)
# Step 2: Aggregate by groups using arithmetic weighted mean
by_bracket <- data_with_metrics %>%
group_by(income_bracket) %>%
summarise(
total_households = sum(households),
nh_mean = weighted.mean(nh, households),
neb = 1 / (1 + nh_mean), # Correct aggregation
.groups = "drop"
) %>%
arrange(desc(neb))
print(by_bracket)
#> # A tibble: 5 × 4
#> income_bracket total_households nh_mean neb
#> <chr> <int> <dbl> <dbl>
#> 1 0-30% AMI 42020 22.1 0.0433
#> 2 50-80% AMI 55223 22.4 0.0427
#> 3 80-100% AMI 26530 22.7 0.0422
#> 4 30-50% AMI 32511 23.6 0.0407
#> 5 100%+ AMI 116969 23.8 0.0404
# Step 3: Identify high-burden households
high_burden_threshold <- 0.06 # 6% energy burden threshold
nh_threshold <- (1 / high_burden_threshold) - 1 # = 15.67
high_burden_count <- sum(
data_with_metrics$households[data_with_metrics$nh < nh_threshold]
)
total_households <- sum(data_with_metrics$households)
cat(sprintf("\nHouseholds with >6%% energy burden: %.1f%%\n",
(high_burden_count / total_households) * 100))
#>
#> Households with >6% energy burden: 37.7%For aggregation across households:
For a quick reference guide, see NEB_QUICKSTART.md in
the package repository.
For practical examples with real data, see
vignette("getting-started").
Proof that NEB = EB at household level:
\[\text{Nh} = \frac{G - S}{S}\]
\[\text{NEB} = \frac{1}{1 + \text{Nh}} = \frac{1}{1 + \frac{G-S}{S}} = \frac{1}{\frac{S + G - S}{S}} = \frac{1}{\frac{G}{S}} = \frac{S}{G} = \text{EB}\]
Proof that harmonic mean of EB equals arithmetic mean via Nh:
For weighted aggregation with weights \(w_i\):
\[\text{EB}_{\text{harmonic}} = \frac{1}{\sum_i w_i \cdot \frac{1}{\text{EB}_i}}\]
Since \(\text{EB}_i = S_i / G_i\):
\[\text{EB}_{\text{harmonic}} = \frac{1}{\sum_i w_i \cdot \frac{G_i}{S_i}}\]
Now via Nh method:
\[\text{Nh}_i = \frac{G_i - S_i}{S_i} = \frac{G_i}{S_i} - 1\]
\[\overline{\text{Nh}} = \sum_i w_i \cdot \text{Nh}_i\]
\[\text{EB}_{\text{via Nh}} = \frac{1}{1 + \overline{\text{Nh}}} = \frac{1}{1 + \sum_i w_i \cdot \left(\frac{G_i}{S_i} - 1\right)}\]
\[= \frac{1}{\sum_i w_i \cdot \frac{G_i}{S_i} - \sum_i w_i + 1}\]
When weights are normalized (\(\sum_i w_i = 1\)), this simplifies to:
\[= \frac{1}{\sum_i w_i \cdot \frac{G_i}{S_i}} = \text{EB}_{\text{harmonic}}\]
Therefore, both methods are mathematically equivalent. ∎