Help for package eddington

Title:

Compute a Cyclist's Eddington Number

Version:

4.3.0

Description:

Compute a cyclist's Eddington number, including efficiently computing cumulative E over a vector. A cyclist's Eddington number https://en.wikipedia.org/wiki/Arthur_Eddington#Eddington_number_for_cycling is the maximum number satisfying the condition such that a cyclist has ridden E miles or greater on E distinct days. The algorithm in this package is an improvement over the conventional approach because both summary statistics and cumulative statistics can be computed in linear time, since it does not require initial sorting of the data. These functions may also be used for computing h-indices for authors, a metric described by Hirsch (2005) <doi:10.1073/pnas.0507655102>. Both are specific applications of computing the side length of a Durfee square https://en.wikipedia.org/wiki/Durfee_square. Some additional author-level metrics such as g-index and i10-index are also included in the package.

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Encoding:

UTF-8

LazyData:

true

Depends:

R (≥ 4.2.0)

LinkingTo:

Rcpp

Imports:

Rcpp, R6, methods, xml2

Suggests:

testthat, knitr, rmarkdown, stats, dplyr, tibble

SystemRequirements:

C++17

VignetteBuilder:

knitr

RoxygenNote:

7.3.2

URL:

https://github.com/pegeler/eddington2

BugReports:

https://github.com/pegeler/eddington2/issues

NeedsCompilation:

yes

Packaged:

2026-04-12 03:16:10 UTC; pablo

Author:

Paul Egeler [aut, cre], Tashi Reigle [ctb]

Maintainer:

Paul Egeler <paulegeler@gmail.com>

Repository:

CRAN

Date/Publication:

2026-04-13 21:00:02 UTC

Calculate the cumulative Eddington number

Description

This function is much like E_num except it provides a cumulative Eddington number over the vector rather than a single summary number.

Usage

E_cum(rides)

Arguments

rides

A vector of mileage, where each element represents a single day.

Value

An integer vector the same length as rides.

Get the number of rides required to increment to the next Eddington number

Description

Get the number of rides required to increment to the next Eddington number.

Usage

E_next(rides)

Arguments

rides

A vector of mileage, where each element represents a single day.

Value

A named list with the current Eddington number (E) and the number of rides required to increment by one (req).

Get the Eddington number for cycling

Description

Gets the Eddington number for cycling. The Eddington Number for cycling, E, is the maximum number where a cyclist has ridden E miles on E distinct days.

Usage

E_num(rides)

Arguments

rides

A vector of mileage, where each element represents a single day.

Details

The Eddington Number for cycling is related to computing the rank of an integer partition, which is the same as computing the side length of its Durfee square. Another relevant application of this metric is computing the Hirsch index (doi:10.1073/pnas.0507655102) for publications.

This is not to be confused with the Eddington Number in astrophysics, N_{Edd}, which represents the number of protons in the observable universe.

Value

An integer which is the Eddington cycling number for the data provided.

Examples

# Randomly generate a set of 15 rides
rides <- rgamma(15, shape = 2, scale = 10)

# View the rides sorted in decreasing order
stats::setNames(sort(rides, decreasing = TRUE), seq_along(rides))

# Get the Eddington number
E_num(rides)

Determine the number of additional rides required to achieve a specified Eddington number

Description

Determine the number of additional rides required to achieve a specified Eddington number.

Usage

E_req(rides, candidate)

Arguments

rides

A vector of mileage, where each element represents a single day.

candidate

The Eddington number to test for.

Value

An integer vector of length 1. Returns 0L if E is already achieved.

Determine if a dataset satisfies a specified Eddington number

Description

Indicates whether a certain Eddington number is satisfied, given the data.

Usage

E_sat(rides, candidate)

Arguments

rides

A vector of mileage, where each element represents a single day.

candidate

The Eddington number to test for.

Value

A logical vector of length 1.

An R6 Class for Tracking Eddington Numbers for Cycling

Description

The class will maintain the state of the algorithm, allowing for efficient updates as new rides come in.

Warnings

The implementation uses an experimental base R feature utils::hashtab.

Cloning of Eddington objects is disabled. Additionally, Eddington objects cannot be serialized; they cannot be carried between sessions using base::saveRDS or base::save and then loaded later using base::readRDS or base::load.

Active bindings

current: The current Eddington number.
cumulative: A vector of cumulative Eddington numbers.
number_to_next: The number of rides needed to get to the next Eddington number.
n: The number of rides in the data.
hashmap: The hash map of rides above the current Eddington number.

Methods

Method `new()`

Create a new Eddington object.

Usage

Eddington$new(rides, store.cumulative = FALSE)

Arguments

rides: A vector of rides
store.cumulative: logical, indicating whether to keep a vector of cumulative Eddington numbers

Returns

A new Eddington object

Method `print()`

Print the current Eddington number.

Usage

Eddington$print()

Method `update()`

Add new rides to the existing Eddington object.

Usage

Eddington$update(rides)

Arguments

rides: A vector of rides

Method `getNumberToTarget()`

Get the number of rides of a specified length to get to a target Eddington number.

Usage

Eddington$getNumberToTarget(target)

Arguments

target: Target Eddington number

Returns

An integer representing the number of rides of target length needed to achieve the target number.

Method `isSatisfied()`

Test if an Eddington number is satisfied.

Usage

Eddington$isSatisfied(target)

Arguments

target: Target Eddington number

Returns

Logical

Examples

# Randomly generate a set of 15 rides
rides <- rgamma(15, shape = 2, scale = 10)

# View the rides sorted in decreasing order
stats::setNames(sort(rides, decreasing = TRUE), seq_along(rides))

# Create the Eddington object
e <- Eddington$new(rides, store.cumulative = TRUE)

# Get the Eddington number
e$current

# Update with new data
e$update(rep(25, 10))

# See the new data
e$cumulative

An Rcpp Module for Tracking Eddington Numbers for Cycling

Description

A stateful C++ object for computing Eddington numbers.

Arguments

rides

An optional vector of values used to initialize the class.

store_cumulative

Whether to store a vector of the cumulative Eddington number, as accessed from the cumulative property.

Fields

new: Constructor. Parameter list may either be empty, store_cumulative, or rides and store_cumulative
current: The current Eddington number.
cumulative: A vector of Eddington numbers or NULL if store_cumulative is FALSE.
hashmap: A data.frame containing the distances and counts above the current Eddington number.
update: Update the class state with new data.
getNumberToNext: Get the number of additional distances required to reach the next Eddington number.
getNumberToTarget: Get the number of additional distances required to reach a target Eddington number.

Warning

EddingtonModule objects cannot be serialized at this time; they cannot be carried between sessions using base::saveRDS or base::save and then loaded later using base::readRDS or base::load.

Examples

# Create a class instance with some initial data
e <- EddingtonModule$new(c(3, 3, 2), store_cumulative = TRUE)
e$current

# Update with new data and look at the vector of cumulative Eddington numbers.
e$update(c(3, 3, 5))
e$cumulative

# Get the number of rides required to reach the next Eddington number and
# an Eddington number of 4.
e$getNumberToNext()
e$getNumberToTarget(4)

A year of simulated bicycle ride mileages, aggregated by day

Description

Simulated dates and distances of rides occurring in 2009. This is an aggregation of the rides dataset by day.

Usage

daily_totals

Format

A data frame with 178 rows and 2 variables:

ride_date: date the ride occurred
total_length: the total length in miles for each day

Details

The dataset contains a total of 3,419 miles spread across 178 unique days. The Eddington number for the year was 29.

Compute the side length of a Durfee square

Description

Compute the side length of a Durfee square

Usage

durfee(is)

Arguments

is

An integer vector representing an integer partition.

Value

The side length of the Durfee square for that partition.

Compute the distance between two points using the Haversine formula

Description

Uses the Haversine great-circle distance formula to compute the distance between two latitude/longitude points.

Usage

get_haversine_distance(
  lat_1,
  lon_1,
  lat_2,
  lon_2,
  units = c("miles", "kilometers")
)

Arguments

lat_1, lon_1, lat_2, lon_2

The coordinates used to compute the distance.

units

The units of the output distance.

Value

The distance between two points in the requested units.

References

https://en.wikipedia.org/wiki/Haversine_formula

Examples

# In NYC, 20 blocks == 1 mile. Thus, computing the distance between two
# points along 7th Ave from W 39 St to W 59 St should return ~1 mile.
w39_coords <- list(lat=40.75406905512651, lon=-73.98830604245481)
w59_coords <- list(lat=40.76684156255418, lon=-73.97908243833855)

get_haversine_distance(
  w39_coords$lat,
  w39_coords$lon,
  w59_coords$lat,
  w59_coords$lon,
  "miles"
)

# The total distance along a sequence of points can be computed. Consider the
# following sequence of points along Park Ave in the form of a list of points
# where each point is a list containing a `lat` and `lon` tag.
park_ave_coords <- list(
  list(lat=40.735337983655434, lon=-73.98973648773142),  # E 15 St
  list(lat=40.74772623378332, lon=-73.98066078090876),   # E 35 St
  list(lat=40.76026319186414, lon=-73.97149360922498),   # E 55 St
  list(lat=40.77301604875587, lon=-73.96217737679450)    # E 75 St
)

# We can create a function to compute the total distance as follows:
compute_total_distance <- function(coords) {
  sum(
    sapply(
      seq_along(coords)[-1],
      \(i) get_haversine_distance(
        coords[[i]]$lat,
        coords[[i]]$lon,
        coords[[i - 1]]$lat,
        coords[[i - 1]]$lon,
        "miles"
      )
    )
  )
}

# Then applying the function to our sequence results in a total distance.
compute_total_distance(park_ave_coords)

Compute several bibliometric indices

Description

Compute bibliometric indices such as the h-index, g-index, and i10-index.

Usage

h_index(citations, na.rm = FALSE)

i10_index(citations, na.rm = FALSE)

g_index(citations, na.rm = FALSE, is_sorted = FALSE)

Arguments

citations

A vector of citation counts.

na.rm

If TRUE, NA values will be filtered out. Otherwise, any NA value found in the vector will propagate and NA will be returned.

is_sorted

Whether the data is pre-sorted in descending order. This may speed up computations for some algorithms. The pre-sorted assumption is tested and a warning is emitted if unsorted data is detected.

Value

The summary number.

Implicit Type Conversions

The h_index() function implicitly coerces inputs into integer vectors, which will truncate any floating point inputs. This usually will result in expected outputs, as there are not typically fractional inputs in the intended domain, and the definitions of these indices are defined on integral thresholds explicitly. However, to maximize the versatility of g-index computation, the g_index() function does not perform this integer coercion. Therefore it is worth noting that floating point input can push the g-index higher on edge cases. For example, g_index(as.integer(daily_totals$total_length)) != g_index(daily_totals$total_length) Thus to ensure accurate g-index results on data that may have a fractional component, it is advised to first perform an integer conversion prior to passing a vector into g_index() or otherwise validate inputs.

This integer conversion will also cause the h_index() to fail when inputs contain extremely large values (> 2^{31} - 1). The Eddington number family of functions and durfee() do not have this check, and may result in inaccurate outputs.

References

https://en.wikipedia.org/wiki/Author-level_metrics, https://en.wikipedia.org/wiki/G-index

Define a custom bibliometric index function

Description

Define a custom bibliometric index function

Usage

index(f, cumulative = FALSE)

Arguments

f

A function to be applied to the index before comparison.

cumulative

A logical on whether to apply a cumulative sum to the counts.

Value

A function that will compute the specified index.

Examples

# NOTE: These will all be less performant than their counterparts exported
# in this package, i.e., `h_index()`, `g_index()`, `i10_index()`.
set.seed(2018)
citations <- rgamma(30, shape = 2, scale = 10)

# Create an h-index
my_h_index <- index(force)
my_h_index(citations)

# Create a g-index function
my_g_index <- index(\(i) i * i, cumulative = TRUE)
my_g_index(citations)

# Create an i10-index
my_i10_index <- index(\(i) 10L)
my_i10_index(citations)

Read a GPX file into a data frame containing dates and distances

Description

Reads in a GPS Exchange Format XML document and outputs a data.frame containing distances. The corresponding dates for each track segment (trkseg) will be included if present in the source file, else the date column will be populated with NAs.

Usage

read_gpx(file, units = c("miles", "kilometers"))

Arguments

file

The input file to be parsed.

units

The units desired for the distance metric.

Details

Distances are computed using the Haversine formula and do not account for elevation changes.

This function treats the first timestamp of each trkseg as the date of record. Thus overnight track segments will all count toward the day in which the journey began.

Value

A data frame containing up to two columns:

date: The date of the ride. See description and details.
distance: The distance of the track segment in the requested units.

Examples

## Not run: 
# Get a list of all GPX export files in a directory tree
gpx_export_files <- list.files(
  "/path/to/gpx/exports/",
  pattern = "\\.gpx$",
  full.names = TRUE,
  recursive = TRUE
)

# Read in all files and combine them into a single data frame
rides <- do.call(rbind, lapply(gpx_export_files, read_gpx))

## End(Not run)

A year of simulated bicycle ride mileages

Description

Simulated dates and distances of rides occurring in 2009.

Usage

rides

Format

A data frame with 250 rows and 2 variables:

ride_date: date the ride occurred
ride_length: the length in miles

Details

The dataset contains a total of 3,419 miles spread across 178 unique days. The Eddington number for the year was 29.

Package {eddington}

Calculate the cumulative Eddington number

Description

Usage

Arguments

Value

See Also

Get the number of rides required to increment to the next Eddington number

Description

Usage

Arguments

Value

See Also

Get the Eddington number for cycling

Description

Usage

Arguments

Details

Value

See Also

Examples

Determine the number of additional rides required to achieve a specified Eddington number

Description

Usage

Arguments

Value

See Also

Determine if a dataset satisfies a specified Eddington number

Description

Usage

Arguments

Value

See Also

An R6 Class for Tracking Eddington Numbers for Cycling

Description

Warnings

Active bindings

Methods

Public methods

Method new()

Usage

Arguments

Returns

Method print()

Usage

Method update()

Usage

Arguments

Method getNumberToTarget()

Usage

Arguments

Returns

Method isSatisfied()

Usage

Arguments

Returns

Examples

An Rcpp Module for Tracking Eddington Numbers for Cycling

Description

Arguments

Fields

Warning

Examples

A year of simulated bicycle ride mileages, aggregated by day

Description

Usage

Format

Details

See Also

Compute the side length of a Durfee square

Description

Usage

Arguments

Value

Compute the distance between two points using the Haversine formula

Description

Usage

Arguments

Value

References

Method `new()`

Method `print()`

Method `update()`

Method `getNumberToTarget()`

Method `isSatisfied()`