Help for package bibnets

Title:

Importing, Constructing, and Exporting Bibliometric Networks

Version:

0.6.0

Description:

Imports, constructs, and exports bibliometric networks from scholarly metadata. Reads 'Scopus', 'Web of Science', 'BibTeX', 'RIS', 'OpenAlex', 'Lens.org', 'Dimensions', and 'Crossref' exports. Goes beyond standard co-networks with attention-weighted networks (lead, last, proximity, circular position weights), position-aware counting (harmonic, arithmetic, geometric, golden-ratio), similarity and dissimilarity normalisations, temporal networks with fixed, sliding, and cumulative windows, disparity-filter backbone extraction, historiograph construction, and local citation scoring. Methods described in López-Pernas, Saqr & Apiola (2023) <doi:10.1007/978-3-031-25336-2_5>.

License:

MIT + file LICENSE

URL:

https://github.com/mohsaqr/bibnets

BugReports:

https://github.com/mohsaqr/bibnets/issues

Depends:

R (≥ 4.1.0)

Imports:

Matrix, stats, utils

Suggests:

cograph, igraph, knitr, openalexR, rcrossref, rmarkdown, testthat (≥ 3.0.0), tidygraph

VignetteBuilder:

knitr

Config/testthat/edition:

Encoding:

UTF-8

LazyData:

true

LazyDataCompression:

RoxygenNote:

7.3.3

NeedsCompilation:

Packaged:

2026-06-18 18:29:30 UTC; mohammedsaqr

Author:

Mohammed Saqr

[aut, cre, cph], Sonsoles López-Pernas

[aut]

Maintainer:

Mohammed Saqr <saqr@saqr.me>

Repository:

CRAN

Date/Publication:

2026-06-18 19:00:02 UTC

Aggregate multi-valued fields by an entity

Description

Groups documents by a single-valued or list-column entity (e.g., author, journal) and pools all values from another list-column (e.g., references, keywords) across documents belonging to that entity.

Usage

aggregate_by_entity(data, entity_field, value_field, min_freq = 1L)

Arguments

data

A data frame with id and the specified columns.

entity_field

Character. Name of the entity column. If it is a scalar column (e.g., "journal"), each document belongs to one entity. If it is a list-column (e.g., "authors"), each document may belong to multiple entities.

value_field

Character. Name of the list-column to aggregate (e.g., "references").

min_freq

Integer. Minimum number of papers per entity. Default 1.

Value

A data frame with columns id (entity name) and value_field (list-column of pooled values, with duplicates preserved).

Align columns before row-binding bibliographic files

Description

Align columns before row-binding bibliographic files

Usage

align_biblio_columns(dfs)

Apply counting weights to a generic bipartite matrix

Description

For position-independent counting of non-author fields (references, keywords, etc.). Modifies the bipartite matrix row weights.

Usage

apply_counting(B, counting = "full", network_type = "symmetric")

Arguments

B

A sparse binary bipartite matrix (works x entities).

counting

Character. One of "full", "fractional", "paper", "strength".

network_type

Character. "symmetric" or "coupling".

Value

A weighted sparse matrix.

Build an author network

Description

Constructs a network between authors using one of four relationship types and any of 13 counting methods, including 9 position-dependent methods that respect author byline order.

Usage

author_network(
  data,
  type = "collaboration",
  counting = "full",
  similarity = "none",
  threshold = 0,
  min_occur = 1L,
  position_weights = c(1, 0.8, 0.6, 0.4),
  first_last_weight = 2,
  attention = NULL,
  top_n = NULL,
  self_loops = FALSE,
  deduplicate = TRUE,
  format = "edgelist",
  authors = "authors",
  sep = ";",
  references_sep = ";",
  strip_quotes = TRUE,
  id = NULL
)

Arguments

data

A data frame with at least id and an author column (list-column or delimited string, order preserved). For coupling/co-citation, also needs references.

type

Character. Relationship type:

"collaboration": Co-authorship: authors linked when they co-author a publication.
"coupling": Bibliographic coupling aggregated at author level: authors linked when they cite the same references.
"co_citation": Author co-citation: authors linked when they are cited together by the same paper. Requires a cited_first_authors list-column.
"equivalence": Profile similarity: cosine similarity of authors' full collaboration/citation profiles.

counting

Character. Counting method. Position-independent methods ("full", "fractional", "paper", "strength") work for all types. Position-dependent methods ("harmonic", "arithmetic", "geometric", "adaptive_geometric", "golden", "first", "last", "first_last", "position_weighted") are available for type = "collaboration".

similarity

Character. Similarity measure: "none", "association", "cosine", "jaccard", "inclusion", "equivalence".

threshold

Numeric. Minimum edge weight. Default 0.

min_occur

Integer. Minimum number of papers for an author to be included. Default 1.

position_weights

Numeric vector. Custom weights for counting = "position_weighted". Default c(1, 0.8, 0.6, 0.4).

first_last_weight

Numeric. Multiplier for counting = "first_last". Default 2.

attention

Character or NULL. Attention-based weighting independent of type and counting. One of "proximity" (center authors weighted most), "lead" (first author dominates, quadratic drop), "last" (last author dominates, quadratic rise), "circular" (first and last both prominent). Default NULL (disabled).

top_n

Integer or NULL. Return only the top n edges by weight. Default NULL (all edges).

self_loops

Logical. If TRUE, include self-loops (an entity linked to itself). Default FALSE.

deduplicate

Logical. If TRUE (default), each ⁠(paper, entity)⁠ pair is counted at most once — duplicate entries in the source data (e.g., the same author listed twice on a paper) are treated as one occurrence. Set to FALSE to count every raw occurrence.

format

Character. Output format:

"edgelist": Default. A bibnets_network data frame with columns from, to, weight, count.
"gephi": Gephi-ready data frame: Source, Target, Weight, Count, Type.
"igraph": An igraph graph object (requires igraph).
"cograph": A cograph_network object (requires cograph).
"matrix": A sparse adjacency matrix.

authors

Character. Name of the column containing authors. Default "authors". Use this to point at any column of a custom data set, e.g. authors = "Author Names".

sep

Character. Separator used to split the entity column when it is a plain character column rather than a list-column, e.g. sep = "," or sep = " and ". Default ";". Ignored for list-columns. sep applies only to the author column; the references column uses references_sep.

references_sep

Character. Separator for the references column in type = "coupling". Default ";" (reference strings usually contain internal commas, so this is kept independent of sep). Set it when your references are delimited differently.

strip_quotes

Logical. If TRUE (default), surrounding quote characters are removed from each entity, so a quoted CSV value such as "Alice" or ⁠""Alice""⁠ is treated as Alice. Set FALSE to keep quotes as part of the label.

id

Optional. Name of the column to use as the work identifier (the matrix-row dimension). If NULL (default), an existing id column is used when present, otherwise row numbers are used.

Value

Depends on format: a bibnets_network data frame (default), a Gephi-ready data frame, an igraph graph, a cograph_network, or a sparse matrix.

Examples

data(biblio_data)
author_network(biblio_data, "collaboration")
author_network(biblio_data, "collaboration", counting = "harmonic")
author_network(biblio_data, "collaboration", counting = "geometric",
               similarity = "association")

# Custom CSV: any column name, any separator
d <- data.frame(id = 1:3,
                Researchers = c("Smith J, Doe A", "Smith J, Lee K",
                                "Doe A, Lee K"))
author_network(d, authors = "Researchers", sep = ",")

Compute positional author weights for a single paper

Description

Given the number of authors and their positions, returns a weight vector for positional counting methods. All methods normalize weights to sum to 1 per paper. Position-independent methods (fractional, paper, strength) are handled by apply_counting() instead.

Usage

author_weights(
  n,
  counting = "fractional",
  position_weights = c(1, 0.8, 0.6, 0.4),
  first_last_weight = 2
)

Arguments

n

Integer. Number of authors.

counting

Character. Counting method.

position_weights

Numeric vector. Custom weights for counting = "position_weighted". Default c(1, 0.8, 0.6, 0.4).

first_last_weight

Numeric. Multiplier for first/last authors when counting = "first_last". Default 2.

Value

Numeric vector of length n, summing to 1.

Extract network backbone using the disparity filter

Description

Applies the disparity filter to a weighted edge list. For each edge, it computes an alpha (p-value) from both endpoints and keeps the edge if it is statistically significant from at least one endpoint.

Usage

backbone(edges, alpha = 0.05)

Arguments

edges

A data frame with at least columns from, to, and weight. Must be an undirected edge list (each pair appears once).

alpha

Numeric. Significance threshold in (0, 1). Default 0.05.

Details

The null model asks: given that node i has total strength s_i distributed uniformly across k_i edges, what is the probability that a single edge weight is as large as w_{ij}? The answer is

\alpha_{ij} = \left(1 - \frac{w_{ij}}{s_i}\right)^{k_i - 1}

An edge is retained if \min(\alpha_{ij}, \alpha_{ji}) < \alpha. Nodes with only one edge always have \alpha = 0 and are always kept.

Value

The filtered edge data frame with an added alpha column (the minimum alpha from the two endpoints).

Examples

edges <- data.frame(
  from   = c("A", "A", "A", "B", "C"),
  to     = c("B", "C", "D", "C", "D"),
  weight = c(10,   1,   1,   8,   1)
)
backbone(edges, alpha = 0.05)

Example bibliometric dataset

Description

A small synthetic dataset of 10 scholarly papers with overlapping authors, references, and keywords. Designed for testing and demonstrating all network construction functions in bibnets.

Usage

biblio_data

Format

A data frame with 10 rows and 9 columns:

id: Unique document identifier (W1–W10).
title: Document title.
year: Publication year (2018–2022).
journal: Source journal (Scientometrics, Journal of Informetrics, JASIST, Quantitative Science Studies).
doi: DOI string.
cited_by_count: Times cited.
authors: List-column of author name strings (6 unique authors).
references: List-column of cited reference IDs (10 unique refs, R1–R10). Each paper cites exactly 4 references.
keywords: List-column of keyword strings (24 unique keywords). Each paper has 3 keywords.

Examples

data(biblio_data)
reference_network(biblio_data)
document_network(biblio_data, "coupling")
author_network(biblio_data, "collaboration")

Build a weighted bipartite matrix using positional counting

Description

For position-dependent counting methods, this replaces the binary bipartite matrix entries with positional weights derived from author order.

Usage

build_author_bipartite(
  data,
  field = "authors",
  counting = "full",
  position_weights = c(1, 0.8, 0.6, 0.4),
  first_last_weight = 2,
  deduplicate = TRUE,
  sep = ";",
  strip_quotes = TRUE
)

Arguments

data

A data frame with id and authors (list-column where author order is preserved).

counting

Character. Counting method.

position_weights

Numeric vector for counting = "position_weighted".

first_last_weight

Numeric for counting = "first_last".

Value

A sparse weighted bipartite matrix (works x authors).

Build a bipartite incidence matrix from bibliometric data

Description

Constructs a sparse works x entities two-mode matrix from a data frame with a list-column. This is the core engine behind all network construction functions.

Usage

build_bipartite(
  data,
  field,
  min_freq = 1L,
  deduplicate = TRUE,
  sep = ";",
  strip_quotes = TRUE
)

Arguments

data

A data frame with at least columns id and the field specified.

field

Character. Name of the list-column containing entities (e.g., "authors", "references", "keywords").

min_freq

Integer. Minimum number of occurrences for an entity to be included. Default 1 (no filtering).

sep

Character. Separator used to split field when it is a plain character column. Default ";".

strip_quotes

Logical. Strip surrounding quote characters from entities. Default TRUE.

Value

A sparse dgCMatrix with rows = works (named by id) and columns = unique entities.

Build bipartite matrix from a long-format edge table

Description

Alternative constructor when data is already in long form (e.g., a two-column data frame of document-reference pairs).

Usage

build_bipartite_long(edges, min_freq = 1L)

Arguments

edges

A data frame with columns source (work id) and target (entity id).

min_freq

Integer. Minimum entity frequency. Default 1.

Value

A sparse dgCMatrix.

Build a network where entities share values from another field

Description

Build a network where entities share values from another field

Usage

build_by_network(
  data,
  field,
  by,
  counting,
  similarity,
  threshold,
  min_occur,
  top_n = NULL,
  self_loops = FALSE,
  deduplicate = TRUE,
  strip_quotes = TRUE
)

Build time window definitions

Description

Build time window definitions

Usage

build_windows(min_time, max_time, window, step, strategy)

Build a co-occurrence network from any field

Description

With one field, entities are linked when they co-occur in the same document. With by, entities are linked when they share values of the by field across documents.

Usage

conetwork(
  data,
  field,
  by = NULL,
  sep = ";",
  counting = "full",
  similarity = "none",
  threshold = 0,
  min_occur = 1L,
  top_n = NULL,
  self_loops = FALSE,
  deduplicate = TRUE,
  format = "edgelist",
  strip_quotes = TRUE,
  id = NULL
)

Arguments

data

A data frame with column id and the specified field(s).

field

Character. The entity field — determines what the nodes are.

by

Character or NULL. What links the nodes. If NULL (default), entities are linked by co-occurring in the same document. If specified, entities are linked when they share values from the by field.

sep

Character or NULL. Delimiter for splitting character columns. Default ";". Set to NULL if columns are already list-columns.

counting

Character. Counting method. Default "full".

similarity

Character. Normalization method. Default "none".

threshold

Numeric. Minimum edge weight. Default 0.

min_occur

Integer. Minimum entity frequency. Default 1.

top_n

Integer or NULL. Return only the top n edges by weight. Default NULL (all edges).

self_loops

Logical. If TRUE, include self-loops (an entity linked to itself). Default FALSE.

deduplicate

format

Character. Output format:

"edgelist": Default. A bibnets_network data frame with columns from, to, weight, count.
"gephi": Gephi-ready data frame: Source, Target, Weight, Count, Type.
"igraph": An igraph graph object (requires igraph).
"cograph": A cograph_network object (requires cograph).
"matrix": A sparse adjacency matrix.

strip_quotes

id

Optional. Name of the column to use as the work identifier (the matrix-row dimension). If NULL (default), an existing id column is used when present, otherwise row numbers are used.

Details

Fields can be list-columns (already split) or character columns with delimiters (auto-split via sep).

Value

Depends on format: a bibnets_network data frame (default), a Gephi-ready data frame, an igraph graph, a cograph_network, or a sparse matrix.

Examples

data(biblio_data)

# Co-occurrence: keywords appearing in the same document
conetwork(biblio_data, "keywords")

# Authors linked by shared keywords
conetwork(biblio_data, "authors", by = "keywords")

# Keywords linked by shared authors
conetwork(biblio_data, "keywords", by = "authors")

# Journals linked by shared references (= journal coupling)
conetwork(biblio_data, "journal", by = "references", similarity = "cosine")

# Auto-splits semicolon-delimited string columns
d <- data.frame(id = 1:3, tags = c("ml; dl; nlp", "ml; cv", "dl; cv"))
conetwork(d, "tags")

Build a country network

Description

Constructs a network between countries based on collaboration or coupling.

Usage

country_network(
  data,
  type = "collaboration",
  counting = "full",
  similarity = "none",
  threshold = 0,
  min_occur = 1L,
  attention = NULL,
  top_n = NULL,
  self_loops = FALSE,
  deduplicate = TRUE,
  format = "edgelist",
  countries = "countries",
  sep = ";",
  references_sep = ";",
  strip_quotes = TRUE,
  id = NULL
)

Arguments

data

A data frame with id and a country column (list-column or delimited string). For coupling, also needs references.

type

Character. "collaboration" (default), "coupling", or "equivalence".

counting

Character. Counting method. Default "full".

similarity

Character. Similarity measure. Default "none".

threshold

Numeric. Minimum edge weight. Default 0.

min_occur

Integer. Minimum papers per country. Default 1.

attention

top_n

Integer or NULL. Return only the top n edges by weight. Default NULL (all edges).

self_loops

Logical. If TRUE, include self-loops (an entity linked to itself). Default FALSE.

deduplicate

format

Character. Output format:

"edgelist": Default. A bibnets_network data frame with columns from, to, weight, count.
"gephi": Gephi-ready data frame: Source, Target, Weight, Count, Type.
"igraph": An igraph graph object (requires igraph).
"cograph": A cograph_network object (requires cograph).
"matrix": A sparse adjacency matrix.

countries

Character. Name of the column containing countries. Default "countries".

sep

references_sep

Character. Separator for the references column in type = "coupling". Default ";".

strip_quotes

Logical. If TRUE (default), surrounding quote characters are removed from each entity.

id

Optional. Name of the column to use as the work identifier (the matrix-row dimension). If NULL (default), an existing id column is used when present, otherwise row numbers are used.

Value

Depends on format: a bibnets_network data frame (default), a Gephi-ready data frame, an igraph graph, a cograph_network, or a sparse matrix.

Examples

data(learning_analytics)
country_network(learning_analytics, "collaboration")

Detect bibliometric file format

Description

Detect bibliometric file format

Usage

detect_format(file)

Arguments

file

Path to file.

Value

Character: format name or "unknown".

Build a document network

Description

Constructs a network between documents (papers) in the dataset.

Usage

document_network(
  data,
  type = "coupling",
  counting = "full",
  similarity = "none",
  threshold = 0,
  min_occur = 1L,
  top_n = NULL,
  self_loops = FALSE,
  deduplicate = TRUE,
  format = "edgelist",
  references = "references",
  sep = ";",
  strip_quotes = TRUE,
  id = NULL
)

Arguments

data

A data frame with id and a references column (list-column or delimited string).

type

Character. Relationship type:

"coupling": Bibliographic coupling: documents linked when they share cited references.
"citation": Direct citation: directed edges from citing to cited documents (internal citations only).
"co_citation": Co-citation: documents linked when they are cited together by other documents in the dataset.
"equivalence": Profile similarity of reference vectors.

counting

Character. Counting method. Default "full". Position-dependent methods are not applicable to document networks.

similarity

Character. Similarity measure. Default "none".

threshold

Numeric. Minimum edge weight. Default 0.

min_occur

Integer. Minimum reference frequency. Default 1.

top_n

Integer or NULL. Return only the top n edges by weight. Default NULL (all edges).

self_loops

Logical. If TRUE, include self-loops (an entity linked to itself). Default FALSE.

deduplicate

format

Character. Output format:

"edgelist": Default. A bibnets_network data frame with columns from, to, weight, count.
"gephi": Gephi-ready data frame: Source, Target, Weight, Count, Type.
"igraph": An igraph graph object (requires igraph).
"cograph": A cograph_network object (requires cograph).
"matrix": A sparse adjacency matrix.

references

Character. Name of the column containing cited references. Default "references".

sep

strip_quotes

Logical. If TRUE (default), surrounding quote characters are removed from each reference.

id

Optional. Name of the column to use as the work identifier (the matrix-row dimension). If NULL (default), an existing id column is used when present, otherwise row numbers are used.

Value

Depends on format. For type = "citation", edges are directed (from = citing, to = cited) with weight and count both 1.

Examples

data(biblio_data)
document_network(biblio_data, "coupling")
document_network(biblio_data, "coupling", counting = "strength")

Convert edge list to a square sparse matrix

Description

Convert edge list to a square sparse matrix

Usage

edgelist_to_mat(edges, nodes = NULL, symmetric = TRUE)

Arguments

edges

A data frame with columns from, to, weight.

nodes

Optional character vector of node names. If NULL, derived from the edge list.

symmetric

Logical. If TRUE (default), the matrix is made symmetric.

Value

A sparse dgCMatrix.

Ensure a column is a list-column, splitting if needed

Description

Ensure a column is a list-column, splitting if needed

Usage

ensure_list_column(data, field, sep = ";", strip_quotes = TRUE)

Arguments

strip_quotes

Logical. If TRUE (default), surrounding quote characters and whitespace are removed from every entity (e.g. a quoted CSV value "Alice" becomes Alice). See strip_surrounding_quotes().

Filter edges to top-n nodes

Description

Keeps only edges between the most frequent nodes. Node frequency is determined by how many edges each node participates in.

Usage

filter_top(edges, n)

Arguments

edges

A data frame with at least from, to, weight columns.

n

Integer. Number of top nodes to keep.

Value

A filtered data frame with edges among the top n nodes.

Examples

data(biblio_data)
edges <- author_network(biblio_data, "collaboration")

# Keep only edges among the top 3 most connected authors
filter_top(edges, 3)

Build a historiograph (chronological citation network)

Description

Constructs a Garfield-style historiograph: a directed citation network among the most locally cited documents, laid out chronologically.

Usage

historiograph(
  data,
  n = 30,
  min_lcs = 1,
  references = "references",
  sep = ";",
  strip_quotes = TRUE,
  id = NULL
)

Arguments

data

A data frame with id, a references column (list-column or delimited string), and year. Optionally title, journal, doi, cited_by_count.

n

Integer. Number of top locally cited documents to include. Default 30.

min_lcs

Integer. Minimum local citation score for inclusion. Default 1.

references

Character. Name of the column containing cited references. Default "references".

sep

Character. Separator used to split the references column when it is a plain character column. Default ";".

strip_quotes

Logical. If TRUE (default), surrounding quote characters are removed from each reference.

id

Optional. Name of the column to use as the work identifier. If NULL (default), an existing id column is used when present, otherwise row numbers are used.

Value

A list with:

⁠$nodes⁠: Data frame of included documents with id, lcs, gcs, year, title, journal, doi.
⁠$edges⁠: Data frame of directed citation edges with from (citing), to (cited), year_from, year_to.

Examples

data(biblio_data)
h <- historiograph(biblio_data, n = 5)
h$nodes
h$edges

Build an institution network

Description

Constructs a network between institutions (affiliations).

Usage

institution_network(
  data,
  type = "collaboration",
  counting = "full",
  similarity = "none",
  threshold = 0,
  min_occur = 1L,
  attention = NULL,
  top_n = NULL,
  self_loops = FALSE,
  deduplicate = TRUE,
  format = "edgelist",
  affiliations = "affiliations",
  sep = ";",
  references_sep = ";",
  strip_quotes = TRUE,
  id = NULL
)

Arguments

data

A data frame with id and an affiliation column (list-column or delimited string). For coupling, also needs references.

type

Character. "collaboration" (default), "coupling", or "equivalence".

counting

Character. Counting method. Default "full".

similarity

Character. Similarity measure. Default "none".

threshold

Numeric. Minimum edge weight. Default 0.

min_occur

Integer. Minimum papers per institution. Default 1.

attention

top_n

Integer or NULL. Return only the top n edges by weight. Default NULL (all edges).

self_loops

Logical. If TRUE, include self-loops (an entity linked to itself). Default FALSE.

deduplicate

format

Character. Output format:

"edgelist": Default. A bibnets_network data frame with columns from, to, weight, count.
"gephi": Gephi-ready data frame: Source, Target, Weight, Count, Type.
"igraph": An igraph graph object (requires igraph).
"cograph": A cograph_network object (requires cograph).
"matrix": A sparse adjacency matrix.

affiliations

Character. Name of the column containing institutions. Default "affiliations".

sep

references_sep

Character. Separator for the references column in type = "coupling". Default ";".

strip_quotes

Logical. If TRUE (default), surrounding quote characters are removed from each entity.

id

Optional. Name of the column to use as the work identifier (the matrix-row dimension). If NULL (default), an existing id column is used when present, otherwise row numbers are used.

Value

Depends on format: a bibnets_network data frame (default), a Gephi-ready data frame, an igraph graph, a cograph_network, or a sparse matrix.

Examples

data(learning_analytics)
institution_network(learning_analytics, "collaboration")

Build a keyword co-occurrence network

Description

Constructs a network where two keywords are linked when they appear together in the same document.

Usage

keyword_network(
  data,
  keywords = "keywords",
  counting = "full",
  similarity = "none",
  threshold = 0,
  min_occur = 1L,
  attention = NULL,
  top_n = NULL,
  self_loops = FALSE,
  deduplicate = TRUE,
  format = "edgelist",
  sep = ";",
  strip_quotes = TRUE,
  field = NULL,
  id = NULL
)

Arguments

data

A data frame with id and a keyword list-column.

keywords

Character. Name of the keyword column (list-column or delimited string). Default "keywords". Any column of a custom data set works, e.g. keywords = "Tags".

counting

Character. Counting method. Default "full".

similarity

Character. Similarity measure. Default "none".

threshold

Numeric. Minimum edge weight. Default 0.

min_occur

Integer. Minimum keyword frequency. Default 1.

attention

top_n

Integer or NULL. Return only the top n edges by weight. Default NULL (all edges).

self_loops

Logical. If TRUE, include self-loops (an entity linked to itself). Default FALSE.

deduplicate

format

Character. Output format:

"edgelist": Default. A bibnets_network data frame with columns from, to, weight, count.
"gephi": Gephi-ready data frame: Source, Target, Weight, Count, Type.
"igraph": An igraph graph object (requires igraph).
"cograph": A cograph_network object (requires cograph).
"matrix": A sparse adjacency matrix.

sep

strip_quotes

Logical. If TRUE (default), surrounding quote characters are removed from each keyword.

field

Deprecated. Use keywords instead.

id

Optional. Name of the column to use as the work identifier (the matrix-row dimension). If NULL (default), an existing id column is used when present, otherwise row numbers are used.

Value

Depends on format: a bibnets_network data frame (default), a Gephi-ready data frame, an igraph graph, a cograph_network, or a sparse matrix.

Examples

data(biblio_data)
keyword_network(biblio_data)
keyword_network(biblio_data, similarity = "association")

# Custom CSV: any column name, any separator
d <- data.frame(id = 1:3, Tags = c("ml, ai", "ml, nlp", "ai, nlp"))
keyword_network(d, keywords = "Tags", sep = ",")

Learning analytics dataset (OpenAlex)

Description

A corpus of 1,508 gold open-access scholarly works on learning analytics, retrieved from OpenAlex (CC0 licence). All records have a verified title, publication year, and at least one author. Journal names are present for works published in a named source; preprints and book chapters may have NA in journal.

Usage

learning_analytics

Format

A data frame with 1,508 rows and 11 columns:

id: OpenAlex work ID (e.g. "W2769342982").
title: Work title.
year: Publication year (integer).
journal: Source name, or NA if not available.
doi: DOI string without the https://doi.org/ prefix, or NA.
cited_by_count: Number of citing works as recorded in OpenAlex.
type: Work type ("article", "review", "preprint", "book-chapter", etc.).
authors: List-column of author display names (pipe-split from the OpenAlex flat export; one name per authorship slot).
keywords: List-column with one element: the primary OpenAlex topic for the work (e.g. "Online Learning and Analytics").
affiliations: List-column of institution display names (one entry per authorship–institution pair).
countries: List-column of two-letter ISO country codes (one entry per authorship–institution pair).

Source

OpenAlex https://openalex.org, CC0 licence.

Examples

data(learning_analytics)
author_network(learning_analytics, "collaboration")
country_network(learning_analytics, "collaboration")

Compute local citation scores

Description

Counts how many times each document is cited by other documents within the dataset.

Usage

local_citations(
  data,
  references = "references",
  sep = ";",
  strip_quotes = TRUE,
  id = NULL
)

Arguments

data

A data frame with id and a references column (list-column or delimited string). Optionally year, title, journal, doi, cited_by_count.

references

Character. Name of the column containing cited references. Default "references".

sep

Character. Separator used to split the references column when it is a plain character column. Default ";".

strip_quotes

Logical. If TRUE (default), surrounding quote characters are removed from each reference.

id

Optional. Name of the column to use as the work identifier. If NULL (default), an existing id column is used when present, otherwise row numbers are used.

Value

A data frame with columns:

id: Document identifier.
lcs: Local Citation Score: times cited within the dataset.
gcs: Global Citation Score: cited_by_count if available.

Plus any metadata columns present in the input (title, year, journal, doi).

Examples

data(biblio_data)
local_citations(biblio_data)

Convert a sparse square matrix to a tidy edge list

Description

Extracts non-zero entries directly from the sparse representation — no dense matrix allocation. For undirected networks, only the upper triangle is returned.

Usage

mat_to_edgelist(A, directed = FALSE)

Arguments

A

A square sparse or dense matrix.

directed

Logical. If FALSE (default), returns only upper-triangle entries. If TRUE, returns all non-zero off-diagonal entries.

Value

A data frame with columns from, to, weight, sorted by descending weight.

Construct a co-occurrence network via two-mode multiplication

Description

The unified engine for all bibliometric networks. Operates entirely in sparse representation — never allocates a dense n x n matrix.

Usage

multiply_bipartite(
  B,
  mode = "columns",
  similarity = "none",
  threshold = 0,
  top_n = NULL,
  self_loops = FALSE
)

Arguments

B

A sparse bipartite matrix (works x entities), already weighted by the counting method.

mode

Character. "columns" for column-mode co-occurrence (e.g., co-citation), "rows" for row-mode (e.g., coupling).

similarity

Character. Normalization method (see normalize()).

threshold

Numeric. Minimum edge weight to retain.

top_n

Integer or NULL. If specified, keep only the top n most frequent nodes and return all edges among them.

Value

A data frame with columns from, to, weight, count.

Normalize a co-occurrence matrix

Description

Applies a similarity normalization to a square co-occurrence matrix. The diagonal of the input matrix is used as the total occurrence count for each item. Operates entirely in sparse representation.

Usage

normalize(A, method = "none")

Arguments

A

A square symmetric matrix (dense or sparse) representing co-occurrence counts.

method

Character. Normalization method:

"none": No normalization. Returns raw co-occurrence counts.
"association": Association strength (probabilistic affinity index). s_{ij} = c_{ij} / (w_i \cdot w_j). Often recommended as the best normalization for co-occurrence data.
"cosine": Salton's cosine. s_{ij} = c_{ij} / \sqrt{w_i \cdot w_j}.
"jaccard": Jaccard index. s_{ij} = c_{ij} / (w_i + w_j - c_{ij}).
"inclusion": Inclusion index (Simpson coefficient). s_{ij} = c_{ij} / \min(w_i, w_j).
"equivalence": Equivalence index (Salton's cosine squared). s_{ij} = c_{ij}^2 / (w_i \cdot w_j).

Value

A normalized sparse matrix of the same dimensions.

Examples

# Create a small co-occurrence matrix
A <- matrix(c(10, 3, 1, 3, 8, 2, 1, 2, 5), nrow = 3,
            dimnames = list(c("a", "b", "c"), c("a", "b", "c")))
normalize(A, "association")
normalize(A, "cosine")
normalize(A, "jaccard")

Reorder and parse author names

Description

Converts author names to "First Last" order and breaks each name into its components. The parser is aware of nobiliary particles (van, von, de, del, da, der, ...) and generational suffixes (Jr, Sr, II, III, IV), and is case-insensitive so it handles bibnets' uppercased entity labels.

Usage

parse_names(
  x,
  format = c("first_last", "last_initials", "last"),
  surname_first = c("auto", "yes", "no")
)

Arguments

x

Character vector of author names, one name per element. NA and empty strings are preserved.

format

Output style for personal names (group/corporate authors, NA, and empty strings are returned unchanged in every style). One of:

"first_last": (default) "Saqr, Mohammed" -> "Mohammed Saqr".
"last_initials": "Saqr, Mohammed" -> "Saqr M."; multiple given names become concatenated initials ("Garcia Marquez G.J."); any suffix is appended ("Smith J. Jr").
"last": surname only, including any particle ("van der Berg", "de la Cruz").

surname_first

How to read comma-less strings (strings with a comma are always "Last, First"). One of:

"auto": (default) surname-first when the trailing token looks like initials — an all-uppercase token of 1-3 letters, the Scopus / bibnets signature ("WANG Y" -> "Y Wang"'s components). Otherwise treated as "First Last". This is the "bibnets takes precedence" bias: native bibnets/Scopus labels parse correctly with no extra arguments, while ordinary mixed-case "First Last" input is never misread.
"yes": force surname-first ("Wang Yong" -> surname Wang, given Yong).
"no": force given-first ("First Last"); comma-less input is returned unchanged.

May also be given as the logical TRUE / FALSE. Inherently ambiguous input (e.g. uppercase "MOHAMMED LI") follows the auto bias toward the bibnets/Scopus convention; pass "no" to override.

Details

Three name conventions are recognised:

"Last, First" (a comma) — always parsed as surname-then-given.
"SURNAME Initials" (no comma, e.g. "WANG Y", "AYALA-ROMERO JA") — the Scopus / bibnets author-label form.
"First Last" (no comma, e.g. "Mohammed Saqr").

Comma-less strings that look like group or corporate authors (e.g. "WHO Collaborating Group") are detected and left untouched, as are NA and empty strings.

This is an optional, standalone utility. No reader or network builder in bibnets calls it; entity labels are matched verbatim unless you choose to apply this function yourself first.

Value

A character vector the same length as x, formatted per format. The parsed components are attached as the attribute "parts" (independent of format): a data frame with columns original, first, last, particle, suffix, and type (one of "person", "organization", "empty", "missing"). Casing of the input is preserved; periods are stripped from parsed initials.

Input shape

parse_names() takes a flat character vector (one name per element) — not a data frame and not a list. bibnets readers store authors as a list-column (each element is a character vector, because a paper has a variable number of authors), so map the function over it rather than passing the column directly:

df$authors <- lapply(df$authors, parse_names, format = "last_initials")

For an ordinary flat character column (or the from / to columns of a bibnets_network), call it directly: parse_names(df$col).

Recommended workflow

Normalise names before building a network, on the reader's authors list-column. Node identity in bibnets is fixed when the bipartite matrix is built (labels are upper-cased and matched verbatim), so two spellings of one author ("Saqr, Mohammed" and "SAQR M") only merge into a single node if they are normalised before author_network() is called:

d <- read_biblio("scopus.csv")
d$authors <- lapply(d$authors, parse_names, format = "last_initials")
net <- author_network(d, type = "collaboration")

Applying to an existing edgelist

You can call parse_names() on the from / to (or source / target) columns of a built network, but it is a per-column, graph-blind relabelling: edges, pairing, weight and count are preserved, but

apply the same call to both endpoint columns or the two ends use different labels;
the mapping is many-to-one, so distinct authors can collapse onto one label (especially with "last_initials"), and bibnets does not re-aggregate the resulting duplicate edges.

Prefer the pre-build workflow above.

Limitations

Comma-less names are inherently ambiguous; the auto heuristic is biased toward the bibnets/Scopus surname-first convention and may misread uppercase "GIVEN SURNAME" where the surname is 1-3 letters (e.g. "MOHAMMED LI"). Suffix-first garbage ("Jr., Sammy Davis") is not specially handled. Use surname_first to force interpretation when you know the source convention.

Examples

parse_names(c("Saqr, Mohammed", "Lopez-Pernas, Sonsoles"))

# Alternative output styles
parse_names("Saqr, Mohammed", format = "last_initials")  # "Saqr M."
parse_names("Saqr, Mohammed", format = "last")            # "Saqr"
parse_names("van der Berg, Jan", format = "last_initials") # "van der Berg J."

x <- parse_names("Saqr, M.")
x
attr(x, "parts")

# Particles and suffixes
parse_names(c("van der Berg, Jan", "Smith, John, Jr.", "de la Cruz, Ana"))

# Scopus / bibnets surname-first labels are detected automatically
parse_names(c("WANG Y", "AYALA-ROMERO JA", "VAN DER BERG J"))
parse_names("WANG Y", format = "last_initials")          # "WANG Y."

# Override the auto heuristic when you know the convention
parse_names("Wang Yong", surname_first = "yes")          # "Yong Wang"

# Group authors are detected and left unchanged
parse_names("WHO Collaborating Group")

# Recommended workflow: normalise the authors list-column, then build
papers <- data.frame(id = c("P1", "P2", "P3"), stringsAsFactors = FALSE)
papers$authors <- list(
  c("Saqr, Mohammed", "Lopez, Ana"),
  c("SAQR M",         "Lopez, Ana"),
  c("Saqr, Mohammed", "Chen, Wei"))
papers$authors <- lapply(papers$authors, parse_names,
                         format = "last_initials")
net <- author_network(papers, type = "collaboration")
net

List of all available counting methods

Description

List of all available counting methods

Usage

position_independent_counts()

Value

Character vector of method names.

Print a bibnets network edge list

Description

Print a bibnets network edge list

Usage

## S3 method for class 'bibnets_network'
print(x, n = 10L, ...)

Arguments

x

A bibnets_network data frame.

n

Integer. Number of rows to show. Default 10.

...

Ignored.

Value

Invisibly returns x.

Examples

data(biblio_data)
edges <- author_network(biblio_data, "collaboration")
print(edges)

Prune a weighted edge list

Description

Reduces a weighted edge list by removing weak or excess edges.

Usage

prune(edges, threshold = NULL, top_n = NULL)

Arguments

edges

A data frame with at least columns from, to, and weight.

threshold

Numeric. Keep only edges with weight >= threshold.

top_n

Integer. For each node, keep only its top_n strongest edges. An edge is kept if it is in the top top_n for either endpoint.

Value

The filtered edge data frame (same columns as input).

Examples

edges <- data.frame(
  from   = c("A","A","A","B","B","C"),
  to     = c("B","C","D","C","D","D"),
  weight = c(5,  1,  2,  4,  1,  3)
)

# Keep only edges with weight >= 3
prune(edges, threshold = 3)

# Keep the 2 strongest edges per node
prune(edges, top_n = 2)

Read bibliometric data

Description

Universal reader that handles files, folders, format detection, and generic CSV input. Accepts a single file, multiple files, or a directory.

Usage

read_biblio(
  path,
  format = "auto",
  id = NULL,
  authors = NULL,
  keywords = NULL,
  references = NULL,
  countries = NULL,
  affiliations = NULL,
  journal = NULL,
  sep = ";",
  list_cols = NULL,
  ...,
  actors = NULL
)

Arguments

path

Character. Path to a file, a vector of file paths, or a directory containing export files.

format

Character. File format:

"auto": Default. Auto-detect from file content.
"scopus": Scopus CSV.
"wos": Web of Science plaintext.
"wos_tab": Web of Science tab-delimited.
"bibtex": BibTeX .bib file.
"ris": RIS file.
"dimensions": Dimensions CSV.
"lens": Lens.org CSV.
"openalex_csv": Flat OpenAlex CSV export (pipe-delimited fields).
"generic": Any CSV. Map its columns with id, authors, keywords, references, countries, affiliations, journal. Inferred automatically when any of those arguments is supplied, so format = "generic" is optional in that case.

id

Character. Column name for document identifier. Only used when format = "generic". Default NULL (uses row numbers).

authors, keywords, references, countries, affiliations

Character. For format = "generic", the name of the source column to map onto that standard field. Its cells are split on sep into a list-column. For example authors = "Author Names" reads the ⁠Author Names⁠ column into the standard authors list-column.

journal

Character. For format = "generic", the name of the source column to use as the (scalar) journal field. Not split.

sep

Character. Delimiter for splitting the mapped multi-valued columns. Default ";".

list_cols

Character vector. For format = "generic", additional columns to split into list-columns in place (keeping their original names), for fields without a dedicated argument above.

...

Additional arguments passed to the format-specific reader.

actors

Deprecated. Use the entity arguments (authors, keywords, ...) or list_cols instead.

Value

A data frame.

Examples

# Auto-detect format from file content (here: a bundled OpenAlex CSV)
f <- system.file("extdata", "openalex_works.csv", package = "bibnets")
data <- read_biblio(f)
head(data[, c("id", "title", "year", "journal")])

# Read multiple files at once; auto-detects each format
f_scopus <- system.file("extdata", "scopus_sample.csv", package = "bibnets")
f_wos    <- system.file("extdata", "wos_sample.txt",  package = "bibnets")
combined <- read_biblio(c(f_scopus, f_wos))
head(combined[, c("id", "title", "year", "journal")])

# Read every supported export in a directory (here: the bundled extdata)
folder <- system.file("extdata", package = "bibnets")
all_data <- read_biblio(folder)
nrow(all_data)

# Custom CSV: map each source column onto a standard field by name.
# Naming columns implies format = "generic" (no need to pass it).
tmp <- tempfile(fileext = ".csv")
write.csv(data.frame(
  doc_id  = c("a", "b"),
  Authors = c("Smith J; Jones A", "Davis M"),
  Keywords = c("networks; bibliometrics", "analytics")
), tmp, row.names = FALSE)
generic <- read_biblio(tmp,
                       id = "doc_id",
                       authors = "Authors",
                       keywords = "Keywords",
                       sep = ";")
head(generic)

Read a BibTeX file

Description

Parses a .bib file into a standardized bibliometric data frame. Note: standard BibTeX does not contain cited references, so the references column will be empty unless the file includes a non-standard cited-references or note field with reference data.

Usage

read_bibtex(file, encoding = "UTF-8")

Arguments

file

Path to a .bib file.

encoding

Character. File encoding. Default "UTF-8".

Value

A data frame in the standard bibnets format: id, title, year, journal, doi, cited_by_count, abstract, type, plus list-columns authors, references (typically empty for BibTeX), and keywords.

Examples

# Write a minimal BibTeX entry to a temp file, then read it
bib <- '@article{smith2020,
  title  = {Bibliometric networks},
  author = {Smith, J. and Jones, K.},
  journal = {Test Journal},
  year   = {2020},
  doi    = {10.1000/test}
}'
f <- tempfile(fileext = ".bib")
writeLines(bib, f)
data <- read_bibtex(f)
data[, c("id", "title", "year", "journal", "doi")]
unlink(f)

Convert Crossref API data to bibnets format

Description

Takes the output of rcrossref::cr_works() (the ⁠$data⁠ tibble/data frame) and converts it to the standardized bibnets format.

Usage

read_crossref(data)

Arguments

data

A data frame from cr_works(...)$data.

Value

A data frame in the standard bibnets format: id, title, year, journal, doi, cited_by_count, abstract, type, plus list-columns authors, references, and keywords.

Examples

# Construct a minimal data frame matching the structure of
# rcrossref::cr_works(...)$data. In practice, pass that data frame directly.
raw <- data.frame(
  doi = c("10.1/a", "10.2/b"),
  title = c("First paper", "Second paper"),
  issued = c("2022-01-01", "2021-06-15"),
  container.title = c("Journal A", "Journal B"),
  is.referenced.by.count = c("3", "9"),
  type = c("journal-article", "journal-article"),
  stringsAsFactors = FALSE
)
raw$author <- list(
  data.frame(given = c("Jane", "Anne"),
             family = c("Smith", "Jones"),
             stringsAsFactors = FALSE),
  data.frame(given = "Mark", family = "Davis", stringsAsFactors = FALSE)
)
data <- read_crossref(raw)
head(data[, c("id", "title", "year", "journal")])

Read Dimensions CSV export

Description

Parses a CSV file exported from Dimensions into a standardized bibliometric data frame.

Usage

read_dimensions(file, encoding = "UTF-8")

Arguments

file

Path to a Dimensions CSV export file.

encoding

Character. File encoding. Default "UTF-8".

Value

A data frame in the standard bibnets format: id, title, year, journal, doi, cited_by_count, abstract, type, plus list-columns authors, references, and keywords. Dimensions-specific extras: affiliations (list-column), countries (list-column).

Examples

f <- system.file("extdata", "dimensions_sample.csv", package = "bibnets")
data <- read_dimensions(f)
head(data[, c("id", "title", "year", "journal")])

Read a generic CSV with user-specified columns

Description

Read a generic CSV with user-specified columns

Usage

read_generic(
  file,
  id = NULL,
  sep = ";",
  authors = NULL,
  keywords = NULL,
  references = NULL,
  countries = NULL,
  affiliations = NULL,
  journal = NULL,
  list_cols = NULL
)

Read Lens.org CSV export

Description

Parses a CSV file exported from Lens.org into a standardized bibliometric data frame.

Usage

read_lens(file, encoding = "UTF-8")

Arguments

file

Path to a Lens.org CSV export file.

encoding

Character. File encoding. Default "UTF-8".

Value

A data frame in the standard bibnets format: id, title, year, journal, doi, cited_by_count, abstract, type, plus list-columns authors, references, and keywords.

Examples

f <- system.file("extdata", "lens_sample.csv", package = "bibnets")
data <- read_lens(f)
head(data[, c("id", "title", "year", "journal")])

Convert OpenAlex data to bibnets format

Description

Takes the output of openalexR::oa_fetch() (a tibble/data frame of works) and converts it to the standardized bibnets format with list-columns.

Usage

read_openalex(data)

Arguments

data

A data frame from oa_fetch(entity = "works", ...). Must contain at least an id column. Common columns include display_name, publication_year, so, doi, cited_by_count, referenced_works, ab, and author (nested).

Value

A data frame in the standard bibnets format: id, title, year, journal, doi, cited_by_count, abstract, type, plus list-columns authors, references, and keywords.

Examples

# Construct a minimal data frame matching the structure returned by
# openalexR::oa_fetch(entity = "works", ...). In practice, pass the
# result of oa_fetch() directly.
raw <- data.frame(
  id = c("W123", "W456"),
  display_name = c("First paper", "Second paper"),
  publication_year = c(2022L, 2021L),
  so = c("Journal A", "Journal B"),
  doi = c("https://doi.org/10.1/a", "https://doi.org/10.2/b"),
  cited_by_count = c(5L, 12L),
  stringsAsFactors = FALSE
)
raw$author <- list(
  data.frame(au_display_name = c("Smith J", "Jones A"),
             stringsAsFactors = FALSE),
  data.frame(au_display_name = "Davis M", stringsAsFactors = FALSE)
)
raw$referenced_works <- list(c("W100", "W200"), "W123")
data <- read_openalex(raw)
head(data[, c("id", "title", "year", "journal", "doi")])

Read a flat OpenAlex CSV export

Description

Reads the flat CSV format downloaded directly from the OpenAlex website (openalex.org/works exports). Multi-value fields are pipe-delimited (|). This is distinct from the nested tibble produced by openalexR::oa_fetch(), which is handled by read_openalex().

Usage

read_openalex_csv(file, sep = "|")

Arguments

file

Path to the CSV file.

sep

Character. Delimiter for multi-value fields. Default "|".

Value

A data frame in the standard bibnets format: id, title, year, journal, doi, cited_by_count, abstract, type, plus list-columns authors, references, keywords, affiliations, countries. abstract and references are always NA / empty (not available in the flat export).

Examples

f <- system.file("extdata", "openalex_works.csv", package = "bibnets")
data <- read_openalex_csv(f)

Read an RIS file

Description

Parses a .ris file into a standardized bibliometric data frame. Like BibTeX, standard RIS does not include cited references.

Usage

read_ris(file, encoding = "UTF-8")

Arguments

file

Path to a .ris file.

encoding

Character. File encoding. Default "UTF-8".

Value

Examples

# Write a minimal RIS record to a temp file, then read it
ris <- "TY  - JOUR
AU  - Smith, J.
AU  - Jones, K.
TI  - Bibliometric networks
JO  - Test Journal
PY  - 2020
DO  - 10.1000/test
ER  - "
f <- tempfile(fileext = ".ris")
writeLines(ris, f)
data <- read_ris(f)
data[, c("id", "title", "year", "journal", "doi")]
unlink(f)

Read Scopus CSV export

Description

Parses a CSV file exported from Scopus into a standardized bibliometric data frame with list-columns for multi-valued fields.

Usage

read_scopus(file, encoding = "UTF-8")

Arguments

file

Path to a Scopus CSV export file.

encoding

Character. File encoding. Default "UTF-8".

Value

A data frame in the standard bibnets format: id, title, year, journal, doi, cited_by_count, abstract, type, plus list-columns authors, references, and keywords. Scopus-specific extras: index_keywords (list-column), affiliations (character), language (character).

Examples

f <- system.file("extdata", "scopus_sample.csv", package = "bibnets")
data <- read_scopus(f)
head(data[, c("id", "title", "year", "journal")])

Read a single bibliometric file

Description

Read a single bibliometric file

Usage

read_single_biblio(
  file,
  format,
  id,
  sep,
  authors = NULL,
  keywords = NULL,
  references = NULL,
  countries = NULL,
  affiliations = NULL,
  journal = NULL,
  list_cols = NULL,
  ...
)

Read Web of Science plaintext or tab-delimited export

Description

Parses a Web of Science export file (plaintext or tab-delimited) into a standardized bibliometric data frame.

Usage

read_wos(file, format = "plaintext")

Arguments

file

Path to a WoS export file (.txt).

format

Character. "plaintext" (default) for WoS tagged format, or "tab" for tab-delimited export.

Value

Examples

f <- system.file("extdata", "wos_sample.txt", package = "bibnets")
data <- read_wos(f)
head(data[, c("id", "title", "year", "journal")])

Build a reference network

Description

Constructs a co-citation or equivalence network among cited references. Two references are linked when they are cited together by the same paper.

Usage

reference_network(
  data,
  type = "co_citation",
  counting = "full",
  similarity = "none",
  threshold = 0,
  min_occur = 1L,
  top_n = NULL,
  self_loops = FALSE,
  deduplicate = TRUE,
  format = "edgelist",
  references = "references",
  sep = ";",
  strip_quotes = TRUE,
  id = NULL
)

Arguments

data

A data frame with id and a references column (list-column or delimited string).

type

Character. "co_citation" (default) or "equivalence".

counting

Character. Counting method. Default "full".

similarity

Character. Similarity measure. Default "none".

threshold

Numeric. Minimum edge weight. Default 0.

min_occur

Integer. Minimum times a reference must be cited. Default 1.

top_n

Integer or NULL. Return only the top n edges by weight. Default NULL (all edges).

self_loops

Logical. If TRUE, include self-loops (an entity linked to itself). Default FALSE.

deduplicate

format

Character. Output format:

"edgelist": Default. A bibnets_network data frame with columns from, to, weight, count.
"gephi": Gephi-ready data frame: Source, Target, Weight, Count, Type.
"igraph": An igraph graph object (requires igraph).
"cograph": A cograph_network object (requires cograph).
"matrix": A sparse adjacency matrix.

references

Character. Name of the column containing cited references. Default "references".

sep

strip_quotes

Logical. If TRUE (default), surrounding quote characters are removed from each reference.

id

Optional. Name of the column to use as the work identifier (the matrix-row dimension). If NULL (default), an existing id column is used when present, otherwise row numbers are used.

Value

Depends on format: a bibnets_network data frame (default), a Gephi-ready data frame, an igraph graph, a cograph_network, or a sparse matrix.

Examples

data(biblio_data)
reference_network(biblio_data)
reference_network(biblio_data, similarity = "association")

Resolve the work-identifier column

Description

Materializes a top-level id column that the network pipeline uses to index works (matrix rows). Resolution rules:

Usage

resolve_id(data, id = NULL)

Arguments

data

A data frame.

id

NULL or a single column name (character scalar).

Details

id = NULL (default): use the existing id column if one is present, otherwise fall back to row numbers (seq_len(nrow(data))).
id = "colname": copy the named column to id. The column must exist.

When id names a column other than "id" and the data already has a distinct "id" column, the request is ambiguous (the existing "id" column might itself be an entity field). Rather than silently overwriting it, this errors and asks the caller to resolve the conflict.

Value

data with a guaranteed character id column.

Resolve file paths from a file, vector of files, or directory

Description

Resolve file paths from a file, vector of files, or directory

Usage

resolve_paths(path)

Scopus dataset — Green Cloud Computing and Quantization (2020–2025)

Description

First 500 records from a Scopus bibliometric export on the intersection of green cloud computing and quantization, covering 2020–2025. Includes full references, author keywords, index keywords, and affiliations.

Usage

scopus_quantum_cloud

Format

A data frame with 499 rows and 12 columns:

id: Scopus EID.
title: Work title.
year: Publication year (integer).
journal: Source title.
doi: DOI string without the https://doi.org/ prefix.
cited_by_count: Times cited in Scopus.
abstract: Abstract text.
type: Document type ("Article", "Review", etc.).
authors: List-column of author name strings.
references: List-column of cited reference strings.
keywords: List-column of author keywords.
affiliations: List-column of affiliation strings.

Source

Scopus bibliometric export. Dataset archived at doi:10.5281/zenodo.17142636 (CC BY 4.0).

Examples

data(scopus_quantum_cloud)
author_network(scopus_quantum_cloud, "collaboration")
keyword_network(scopus_quantum_cloud)
document_network(scopus_quantum_cloud, "coupling", similarity = "cosine")

Build a source (journal) network

Description

Constructs a network between publication sources (journals, book series).

Usage

source_network(
  data,
  type = "coupling",
  counting = "full",
  similarity = "none",
  threshold = 0,
  min_occur = 1L,
  top_n = NULL,
  self_loops = FALSE,
  deduplicate = TRUE,
  format = "edgelist",
  journal = "journal",
  sep = ";",
  references_sep = ";",
  strip_quotes = TRUE,
  id = NULL
)

Arguments

data

A data frame with id and journal (character column). For coupling, also needs references. For co-citation, needs a cited_journals list-column.

type

Character. "coupling" (default), "co_citation", or "equivalence".

counting

Character. Counting method. Default "full".

similarity

Character. Similarity measure. Default "none".

threshold

Numeric. Minimum edge weight. Default 0.

min_occur

Integer. Minimum papers per source. Default 1.

top_n

Integer or NULL. Return only the top n edges by weight. Default NULL (all edges).

self_loops

Logical. If TRUE, include self-loops (an entity linked to itself). Default FALSE.

deduplicate

format

Character. Output format:

"edgelist": Default. A bibnets_network data frame with columns from, to, weight, count.
"gephi": Gephi-ready data frame: Source, Target, Weight, Count, Type.
"igraph": An igraph graph object (requires igraph).
"cograph": A cograph_network object (requires cograph).
"matrix": A sparse adjacency matrix.

journal

Character. Name of the column containing the publication source. Default "journal". Use this to point at any column of a custom data set, e.g. journal = "Source title".

sep

references_sep

Character. Separator for the references column in type = "coupling". Default ";".

strip_quotes

Logical. If TRUE (default), surrounding quote characters are removed from each entity.

id

Optional. Name of the column to use as the work identifier (the matrix-row dimension). If NULL (default), an existing id column is used when present, otherwise row numbers are used.

Value

Depends on format: a bibnets_network data frame (default), a Gephi-ready data frame, an igraph graph, a cograph_network, or a sparse matrix.

Examples

data(biblio_data)
source_network(biblio_data, "coupling")

Parse semicolon-delimited strings into list-column

Description

Splits semicolon-separated strings (common in Scopus/WoS exports) into character vectors, trimming whitespace.

Usage

split_field(x, sep = ";")

Arguments

x

A character vector of semicolon-delimited strings.

sep

Character. Delimiter. Default ";".

Value

A list of character vectors.

Examples

split_field(c("Alice; Bob; Carol", "Dave; Eve"))

Standardize author names

Description

Uppercase, whitespace normalisation, and dot removal from initials (F.J. → FJ). Name order and format are preserved — consistent with how bibliometrix handles multi-source data.

Usage

standardize_authors(x, flip_names = FALSE)

Arguments

x

Character vector of author names.

flip_names

Logical. If TRUE, names in ⁠Last, First⁠ format are reordered to ⁠First Last⁠. Off by default — enable only when all names in x reliably follow the ⁠Last, First⁠ convention.

Value

Character vector, uppercased and cleaned.

Strip surrounding quote characters from entity labels

Description

Removes leading/trailing double-quote characters (straight ⁠"⁠, the CSV doubled "", and curly quotes) plus surrounding whitespace, so quoted values such as "Alice" or ⁠""Bob""⁠ become Alice / Bob. Quotes inside a label (e.g. an apostrophe in ⁠O'Brien⁠) are left untouched.

Usage

strip_surrounding_quotes(x)

Arguments

x

Character vector.

Value

Character vector with surrounding quotes/whitespace removed.

Summarise a bibnets network

Description

Summarise a bibnets network

Usage

## S3 method for class 'bibnets_network'
summary(object, ...)

Arguments

object

A bibnets_network data frame.

...

Ignored.

Value

Invisibly returns object.

Examples

data(biblio_data)
edges <- author_network(biblio_data, "collaboration")
summary(edges)

Build time-windowed networks

Description

Splits data by time windows and builds a separate network for each window using any network function.

Usage

temporal_network(
  data,
  network_fun,
  ...,
  window = 3,
  step = NULL,
  strategy = "fixed",
  time_col = "year"
)

Arguments

data

A data frame with a numeric time column.

network_fun

Function or character string naming a network function (e.g., author_network, "reference_network", conetwork).

...

Additional arguments passed to network_fun (e.g., type, counting, similarity, threshold, top_n).

window

Integer. Width of each time window in units of the time column (years, months, quarters, etc.). Default 3.

step

Integer or NULL. Step size between windows. Default NULL (equals window for fixed, 1 for sliding).

strategy

Character. Time window strategy:

"fixed": Disjoint non-overlapping windows (default).
"sliding": Overlapping windows advancing by step units.
"cumulative": Each window starts at the earliest value and extends further.

time_col

Character. Name of the column containing the time variable. Default "year". Works with any numeric time unit: years, months, quarters, semesters, weeks, etc. (e.g., "month", "quarter", "time").

Value

A named list of data frames (edge lists). Names are window labels like "2018-2020".

Examples

data(biblio_data)

# Fixed 3-year windows
temporal_network(biblio_data, author_network, "collaboration")

# Sliding window
temporal_network(biblio_data, author_network, "collaboration",
                 window = 2, strategy = "sliding")

# Cumulative
temporal_network(biblio_data, reference_network,
                 threshold = 0, strategy = "cumulative", window = 2)

# With string name
temporal_network(biblio_data, "keyword_network", window = 3)

Prepare network for cograph::splot()

Description

Converts a bibnets edge list to a cograph_network object by calling cograph::as_cograph(). Optionally merges node metadata (e.g., from local_citations()) into the network's node table so attributes like lcs or year can be used directly in splot() aesthetic parameters (e.g., node_size = "lcs").

Usage

to_cograph(edges, nodes = NULL, directed = FALSE)

Arguments

edges

A data frame with at least from, to, weight columns.

nodes

Optional data frame of node attributes with an id column (e.g., output of local_citations()). All columns are merged into the cograph_network$nodes table and become available as aesthetic mappings.

directed

Logical. Default FALSE.

Details

Note: bibnets edge lists (from, to, weight) are accepted directly by cograph::splot() without conversion. This function is only needed when you want to attach node-level metadata.

Value

A cograph_network object (S3 list with ⁠$nodes⁠ and ⁠$edges⁠).

Examples


data(biblio_data)

# Without metadata: splot() accepts bibnets edges directly
edges <- author_network(biblio_data, "collaboration")

# With metadata: document network + local citation scores as node size
edges <- document_network(biblio_data, type = "coupling")
nodes <- local_citations(biblio_data)   # keyed by document id
net   <- to_cograph(edges, nodes = nodes)

Export to Gephi node and edge tables

Description

Converts a bibnets edge list (and optional node table) to the CSV format expected by Gephi's Data Laboratory. Column names are remapped to Gephi conventions (Source, Target, Weight, Id, Label).

Usage

to_gephi(edges, nodes = NULL, file = NULL, directed = FALSE)

Arguments

edges

A data frame with at least from, to, weight columns.

nodes

Optional data frame of node attributes. Must contain an id column. All other columns are included as Gephi node attributes.

file

Optional directory path. If supplied, writes nodes.csv and edges.csv into that directory. If NULL (default), returns a list.

directed

Logical. Sets the Type column. Default FALSE.

Value

If file = NULL: a list with ⁠$nodes⁠ and ⁠$edges⁠ data frames. If file is a directory path: writes two CSV files invisibly and returns the file paths.

Examples

data(biblio_data)
edges <- author_network(biblio_data, "collaboration")
gephi <- to_gephi(edges)
head(gephi$edges)

Export to GraphML

Description

Writes a bibnets edge list (and optional node attributes) to a GraphML file using pure base R — no XML package required.

Usage

to_graphml(edges, nodes = NULL, file = NULL, directed = FALSE)

Arguments

edges

A data frame with at least from, to, weight columns.

nodes

Optional data frame of node attributes with an id column.

file

File path to write. If NULL (default), returns the GraphML as a character string.

directed

Logical. Default FALSE.

Value

If file = NULL: GraphML as a character string. Otherwise writes the file and returns the path invisibly.

Examples

data(biblio_data)
edges <- keyword_network(biblio_data)
xml <- to_graphml(edges)
cat(substr(xml, 1, 300))

Convert edge data frame to igraph

Description

Convert edge data frame to igraph

Usage

to_igraph(edges, directed = FALSE)

Arguments

edges

A data frame with at least from, to, weight columns, as returned by any network function in bibnets.

directed

Logical. Default FALSE.

Value

An igraph graph object.

Examples


data(biblio_data)
edges <- author_network(biblio_data, "collaboration")
g <- to_igraph(edges)

Convert edge data frame to adjacency matrix

Description

Convert edge data frame to adjacency matrix

Usage

to_matrix(edges, symmetric = TRUE)

Arguments

edges

A data frame with from, to, weight columns.

symmetric

Logical. If TRUE (default), produces a symmetric matrix.

Value

A sparse Matrix.

Examples

data(biblio_data)
edges <- reference_network(biblio_data, min_occur = 2)
to_matrix(edges)

Convert edge data frame to tbl_graph

Description

Convert edge data frame to tbl_graph

Usage

to_tbl_graph(edges, directed = FALSE)

Arguments

edges

A data frame with at least from, to, weight columns.

directed

Logical. Default FALSE.

Value

A tbl_graph object (tidygraph).

Examples


data(biblio_data)
edges <- keyword_network(biblio_data)
tg <- to_tbl_graph(edges)

Warn when a separator likely failed to split a multi-entity column

Description

Splitting with the wrong separator silently yields one "entity" per row (e.g., a whole author byline as a single node). Heuristic: no row split into more than one entity, yet most non-empty strings contain a common structural delimiter.

Usage

warn_if_sep_mismatch(col, parts, field, sep)

Details

Only structural delimiters (";", "|", tab) are considered, because they essentially never occur inside a single legitimate label. Commas and " and " are deliberately excluded: they appear inside valid single values (e.g. "Last, First" author names, one-reference-per-row citation strings, or organisations like "Smith and Sons"), so warning on them would mislead users with correct data.

Package {bibnets}

Aggregate multi-valued fields by an entity

Description

Usage

Arguments

Value

Align columns before row-binding bibliographic files

Description

Usage

Apply counting weights to a generic bipartite matrix

Description

Usage

Arguments

Value

Build an author network

Description

Usage

Arguments

Value

Examples

Compute positional author weights for a single paper

Description

Usage

Arguments

Value

Extract network backbone using the disparity filter

Description

Usage

Arguments

Details

Value

Examples

Example bibliometric dataset

Description

Usage

Format

Examples

Build a weighted bipartite matrix using positional counting

Description

Usage

Arguments

Value

Build a bipartite incidence matrix from bibliometric data

Description

Usage

Arguments

Value

Build bipartite matrix from a long-format edge table

Description

Usage

Arguments

Value

Build a network where entities share values from another field

Description

Usage

Build time window definitions

Description

Usage

Build a co-occurrence network from any field

Description

Usage

Arguments

Details

Value

Examples

Build a country network

Description

Usage

Arguments

Value

Examples

Detect bibliometric file format

Description

Usage

Arguments

Value

Build a document network

Description

Usage

Arguments