| Title: | Importing, Constructing, and Exporting Bibliometric Networks |
| Version: | 0.6.0 |
| Description: | Imports, constructs, and exports bibliometric networks from scholarly metadata. Reads 'Scopus', 'Web of Science', 'BibTeX', 'RIS', 'OpenAlex', 'Lens.org', 'Dimensions', and 'Crossref' exports. Goes beyond standard co-networks with attention-weighted networks (lead, last, proximity, circular position weights), position-aware counting (harmonic, arithmetic, geometric, golden-ratio), similarity and dissimilarity normalisations, temporal networks with fixed, sliding, and cumulative windows, disparity-filter backbone extraction, historiograph construction, and local citation scoring. Methods described in López-Pernas, Saqr & Apiola (2023) <doi:10.1007/978-3-031-25336-2_5>. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/mohsaqr/bibnets |
| BugReports: | https://github.com/mohsaqr/bibnets/issues |
| Depends: | R (≥ 4.1.0) |
| Imports: | Matrix, stats, utils |
| Suggests: | cograph, igraph, knitr, openalexR, rcrossref, rmarkdown, testthat (≥ 3.0.0), tidygraph |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| LazyDataCompression: | xz |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-06-18 18:29:30 UTC; mohammedsaqr |
| Author: | Mohammed Saqr |
| Maintainer: | Mohammed Saqr <saqr@saqr.me> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-18 19:00:02 UTC |
Aggregate multi-valued fields by an entity
Description
Groups documents by a single-valued or list-column entity (e.g., author, journal) and pools all values from another list-column (e.g., references, keywords) across documents belonging to that entity.
Usage
aggregate_by_entity(data, entity_field, value_field, min_freq = 1L)
Arguments
data |
A data frame with |
entity_field |
Character. Name of the entity column. If it is a scalar
column (e.g., |
value_field |
Character. Name of the list-column to aggregate
(e.g., |
min_freq |
Integer. Minimum number of papers per entity. Default 1. |
Value
A data frame with columns id (entity name) and value_field
(list-column of pooled values, with duplicates preserved).
Align columns before row-binding bibliographic files
Description
Align columns before row-binding bibliographic files
Usage
align_biblio_columns(dfs)
Apply counting weights to a generic bipartite matrix
Description
For position-independent counting of non-author fields (references, keywords, etc.). Modifies the bipartite matrix row weights.
Usage
apply_counting(B, counting = "full", network_type = "symmetric")
Arguments
B |
A sparse binary bipartite matrix (works x entities). |
counting |
Character. One of |
network_type |
Character. |
Value
A weighted sparse matrix.
Build an author network
Description
Constructs a network between authors using one of four relationship types and any of 13 counting methods, including 9 position-dependent methods that respect author byline order.
Usage
author_network(
data,
type = "collaboration",
counting = "full",
similarity = "none",
threshold = 0,
min_occur = 1L,
position_weights = c(1, 0.8, 0.6, 0.4),
first_last_weight = 2,
attention = NULL,
top_n = NULL,
self_loops = FALSE,
deduplicate = TRUE,
format = "edgelist",
authors = "authors",
sep = ";",
references_sep = ";",
strip_quotes = TRUE,
id = NULL
)
Arguments
data |
A data frame with at least |
type |
Character. Relationship type:
|
counting |
Character. Counting method. Position-independent methods
( |
similarity |
Character. Similarity measure: |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum number of papers for an author to be included. Default 1. |
position_weights |
Numeric vector. Custom weights for
|
first_last_weight |
Numeric. Multiplier for |
attention |
Character or NULL. Attention-based weighting independent of
|
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
authors |
Character. Name of the column containing authors. Default
|
sep |
Character. Separator used to split the entity column when it
is a plain character column rather than a list-column, e.g. |
references_sep |
Character. Separator for the |
strip_quotes |
Logical. If |
id |
Optional. Name of the column to use as the work identifier
(the matrix-row dimension). If |
Value
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
Examples
data(biblio_data)
author_network(biblio_data, "collaboration")
author_network(biblio_data, "collaboration", counting = "harmonic")
author_network(biblio_data, "collaboration", counting = "geometric",
similarity = "association")
# Custom CSV: any column name, any separator
d <- data.frame(id = 1:3,
Researchers = c("Smith J, Doe A", "Smith J, Lee K",
"Doe A, Lee K"))
author_network(d, authors = "Researchers", sep = ",")
Compute positional author weights for a single paper
Description
Given the number of authors and their positions, returns a weight vector
for positional counting methods. All methods normalize weights to sum to 1
per paper. Position-independent methods (fractional, paper, strength) are
handled by apply_counting() instead.
Usage
author_weights(
n,
counting = "fractional",
position_weights = c(1, 0.8, 0.6, 0.4),
first_last_weight = 2
)
Arguments
n |
Integer. Number of authors. |
counting |
Character. Counting method. |
position_weights |
Numeric vector. Custom weights for
|
first_last_weight |
Numeric. Multiplier for first/last authors
when |
Value
Numeric vector of length n, summing to 1.
Extract network backbone using the disparity filter
Description
Applies the disparity filter to a weighted edge list. For each edge, it computes an alpha (p-value) from both endpoints and keeps the edge if it is statistically significant from at least one endpoint.
Usage
backbone(edges, alpha = 0.05)
Arguments
edges |
A data frame with at least columns |
alpha |
Numeric. Significance threshold in (0, 1). Default |
Details
The null model asks: given that node i has total strength s_i
distributed uniformly across k_i edges, what is the probability that
a single edge weight is as large as w_{ij}? The answer is
\alpha_{ij} = \left(1 - \frac{w_{ij}}{s_i}\right)^{k_i - 1}
An edge is retained if \min(\alpha_{ij}, \alpha_{ji}) < \alpha.
Nodes with only one edge always have \alpha = 0 and are always kept.
Value
The filtered edge data frame with an added alpha column (the
minimum alpha from the two endpoints).
Examples
edges <- data.frame(
from = c("A", "A", "A", "B", "C"),
to = c("B", "C", "D", "C", "D"),
weight = c(10, 1, 1, 8, 1)
)
backbone(edges, alpha = 0.05)
Example bibliometric dataset
Description
A small synthetic dataset of 10 scholarly papers with overlapping authors, references, and keywords. Designed for testing and demonstrating all network construction functions in bibnets.
Usage
biblio_data
Format
A data frame with 10 rows and 9 columns:
- id
Unique document identifier (W1–W10).
- title
Document title.
- year
Publication year (2018–2022).
- journal
Source journal (Scientometrics, Journal of Informetrics, JASIST, Quantitative Science Studies).
- doi
DOI string.
- cited_by_count
Times cited.
- authors
List-column of author name strings (6 unique authors).
- references
List-column of cited reference IDs (10 unique refs, R1–R10). Each paper cites exactly 4 references.
- keywords
List-column of keyword strings (24 unique keywords). Each paper has 3 keywords.
Examples
data(biblio_data)
reference_network(biblio_data)
document_network(biblio_data, "coupling")
author_network(biblio_data, "collaboration")
Build a weighted bipartite matrix using positional counting
Description
For position-dependent counting methods, this replaces the binary bipartite matrix entries with positional weights derived from author order.
Usage
build_author_bipartite(
data,
field = "authors",
counting = "full",
position_weights = c(1, 0.8, 0.6, 0.4),
first_last_weight = 2,
deduplicate = TRUE,
sep = ";",
strip_quotes = TRUE
)
Arguments
data |
A data frame with |
counting |
Character. Counting method. |
position_weights |
Numeric vector for |
first_last_weight |
Numeric for |
Value
A sparse weighted bipartite matrix (works x authors).
Build a bipartite incidence matrix from bibliometric data
Description
Constructs a sparse works x entities two-mode matrix from a data frame with a list-column. This is the core engine behind all network construction functions.
Usage
build_bipartite(
data,
field,
min_freq = 1L,
deduplicate = TRUE,
sep = ";",
strip_quotes = TRUE
)
Arguments
data |
A data frame with at least columns |
field |
Character. Name of the list-column containing entities
(e.g., |
min_freq |
Integer. Minimum number of occurrences for an entity to be included. Default 1 (no filtering). |
sep |
Character. Separator used to split |
strip_quotes |
Logical. Strip surrounding quote characters from
entities. Default |
Value
A sparse dgCMatrix with rows = works (named by id) and
columns = unique entities.
Build bipartite matrix from a long-format edge table
Description
Alternative constructor when data is already in long form (e.g., a two-column data frame of document-reference pairs).
Usage
build_bipartite_long(edges, min_freq = 1L)
Arguments
edges |
A data frame with columns |
min_freq |
Integer. Minimum entity frequency. Default 1. |
Value
A sparse dgCMatrix.
Build a network where entities share values from another field
Description
Build a network where entities share values from another field
Usage
build_by_network(
data,
field,
by,
counting,
similarity,
threshold,
min_occur,
top_n = NULL,
self_loops = FALSE,
deduplicate = TRUE,
strip_quotes = TRUE
)
Build time window definitions
Description
Build time window definitions
Usage
build_windows(min_time, max_time, window, step, strategy)
Build a co-occurrence network from any field
Description
With one field, entities are linked when they co-occur in the same
document. With by, entities are linked when they share values of the
by field across documents.
Usage
conetwork(
data,
field,
by = NULL,
sep = ";",
counting = "full",
similarity = "none",
threshold = 0,
min_occur = 1L,
top_n = NULL,
self_loops = FALSE,
deduplicate = TRUE,
format = "edgelist",
strip_quotes = TRUE,
id = NULL
)
Arguments
data |
A data frame with column |
field |
Character. The entity field — determines what the nodes are. |
by |
Character or |
sep |
Character or |
counting |
Character. Counting method. Default |
similarity |
Character. Normalization method. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum entity frequency. Default 1. |
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
strip_quotes |
Logical. If |
id |
Optional. Name of the column to use as the work identifier
(the matrix-row dimension). If |
Details
Fields can be list-columns (already split) or character columns with
delimiters (auto-split via sep).
Value
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
Examples
data(biblio_data)
# Co-occurrence: keywords appearing in the same document
conetwork(biblio_data, "keywords")
# Authors linked by shared keywords
conetwork(biblio_data, "authors", by = "keywords")
# Keywords linked by shared authors
conetwork(biblio_data, "keywords", by = "authors")
# Journals linked by shared references (= journal coupling)
conetwork(biblio_data, "journal", by = "references", similarity = "cosine")
# Auto-splits semicolon-delimited string columns
d <- data.frame(id = 1:3, tags = c("ml; dl; nlp", "ml; cv", "dl; cv"))
conetwork(d, "tags")
Build a country network
Description
Constructs a network between countries based on collaboration or coupling.
Usage
country_network(
data,
type = "collaboration",
counting = "full",
similarity = "none",
threshold = 0,
min_occur = 1L,
attention = NULL,
top_n = NULL,
self_loops = FALSE,
deduplicate = TRUE,
format = "edgelist",
countries = "countries",
sep = ";",
references_sep = ";",
strip_quotes = TRUE,
id = NULL
)
Arguments
data |
A data frame with |
type |
Character. |
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum papers per country. Default 1. |
attention |
Character or NULL. Attention-based weighting independent of
|
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
countries |
Character. Name of the column containing countries.
Default |
sep |
Character. Separator used to split the entity column when it
is a plain character column rather than a list-column, e.g. |
references_sep |
Character. Separator for the |
strip_quotes |
Logical. If |
id |
Optional. Name of the column to use as the work identifier
(the matrix-row dimension). If |
Value
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
Examples
data(learning_analytics)
country_network(learning_analytics, "collaboration")
Detect bibliometric file format
Description
Detect bibliometric file format
Usage
detect_format(file)
Arguments
file |
Path to file. |
Value
Character: format name or "unknown".
Build a document network
Description
Constructs a network between documents (papers) in the dataset.
Usage
document_network(
data,
type = "coupling",
counting = "full",
similarity = "none",
threshold = 0,
min_occur = 1L,
top_n = NULL,
self_loops = FALSE,
deduplicate = TRUE,
format = "edgelist",
references = "references",
sep = ";",
strip_quotes = TRUE,
id = NULL
)
Arguments
data |
A data frame with |
type |
Character. Relationship type:
|
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum reference frequency. Default 1. |
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
references |
Character. Name of the column containing cited references.
Default |
sep |
Character. Separator used to split the entity column when it
is a plain character column rather than a list-column, e.g. |
strip_quotes |
Logical. If |
id |
Optional. Name of the column to use as the work identifier
(the matrix-row dimension). If |
Value
Depends on format. For type = "citation", edges are directed
(from = citing, to = cited) with weight and count both 1.
Examples
data(biblio_data)
document_network(biblio_data, "coupling")
document_network(biblio_data, "coupling", counting = "strength")
Convert edge list to a square sparse matrix
Description
Convert edge list to a square sparse matrix
Usage
edgelist_to_mat(edges, nodes = NULL, symmetric = TRUE)
Arguments
edges |
A data frame with columns |
nodes |
Optional character vector of node names. If |
symmetric |
Logical. If |
Value
A sparse dgCMatrix.
Ensure a column is a list-column, splitting if needed
Description
Ensure a column is a list-column, splitting if needed
Usage
ensure_list_column(data, field, sep = ";", strip_quotes = TRUE)
Arguments
strip_quotes |
Logical. If |
Filter edges to top-n nodes
Description
Keeps only edges between the most frequent nodes. Node frequency is determined by how many edges each node participates in.
Usage
filter_top(edges, n)
Arguments
edges |
A data frame with at least |
n |
Integer. Number of top nodes to keep. |
Value
A filtered data frame with edges among the top n nodes.
Examples
data(biblio_data)
edges <- author_network(biblio_data, "collaboration")
# Keep only edges among the top 3 most connected authors
filter_top(edges, 3)
Build a historiograph (chronological citation network)
Description
Constructs a Garfield-style historiograph: a directed citation network among the most locally cited documents, laid out chronologically.
Usage
historiograph(
data,
n = 30,
min_lcs = 1,
references = "references",
sep = ";",
strip_quotes = TRUE,
id = NULL
)
Arguments
data |
A data frame with |
n |
Integer. Number of top locally cited documents to include. Default 30. |
min_lcs |
Integer. Minimum local citation score for inclusion. Default 1. |
references |
Character. Name of the column containing cited references.
Default |
sep |
Character. Separator used to split the references column when it is a plain
character column. Default |
strip_quotes |
Logical. If |
id |
Optional. Name of the column to use as the work identifier.
If |
Value
A list with:
$nodesData frame of included documents with
id,lcs,gcs,year,title,journal,doi.$edgesData frame of directed citation edges with
from(citing),to(cited),year_from,year_to.
Examples
data(biblio_data)
h <- historiograph(biblio_data, n = 5)
h$nodes
h$edges
Build an institution network
Description
Constructs a network between institutions (affiliations).
Usage
institution_network(
data,
type = "collaboration",
counting = "full",
similarity = "none",
threshold = 0,
min_occur = 1L,
attention = NULL,
top_n = NULL,
self_loops = FALSE,
deduplicate = TRUE,
format = "edgelist",
affiliations = "affiliations",
sep = ";",
references_sep = ";",
strip_quotes = TRUE,
id = NULL
)
Arguments
data |
A data frame with |
type |
Character. |
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum papers per institution. Default 1. |
attention |
Character or NULL. Attention-based weighting independent of
|
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
affiliations |
Character. Name of the column containing institutions.
Default |
sep |
Character. Separator used to split the entity column when it
is a plain character column rather than a list-column, e.g. |
references_sep |
Character. Separator for the |
strip_quotes |
Logical. If |
id |
Optional. Name of the column to use as the work identifier
(the matrix-row dimension). If |
Value
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
Examples
data(learning_analytics)
institution_network(learning_analytics, "collaboration")
Build a keyword co-occurrence network
Description
Constructs a network where two keywords are linked when they appear together in the same document.
Usage
keyword_network(
data,
keywords = "keywords",
counting = "full",
similarity = "none",
threshold = 0,
min_occur = 1L,
attention = NULL,
top_n = NULL,
self_loops = FALSE,
deduplicate = TRUE,
format = "edgelist",
sep = ";",
strip_quotes = TRUE,
field = NULL,
id = NULL
)
Arguments
data |
A data frame with |
keywords |
Character. Name of the keyword column (list-column or
delimited string). Default |
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum keyword frequency. Default 1. |
attention |
Character or NULL. Attention-based weighting independent of
|
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
sep |
Character. Separator used to split the entity column when it
is a plain character column rather than a list-column, e.g. |
strip_quotes |
Logical. If |
field |
Deprecated. Use |
id |
Optional. Name of the column to use as the work identifier
(the matrix-row dimension). If |
Value
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
Examples
data(biblio_data)
keyword_network(biblio_data)
keyword_network(biblio_data, similarity = "association")
# Custom CSV: any column name, any separator
d <- data.frame(id = 1:3, Tags = c("ml, ai", "ml, nlp", "ai, nlp"))
keyword_network(d, keywords = "Tags", sep = ",")
Learning analytics dataset (OpenAlex)
Description
A corpus of 1,508 gold open-access scholarly works on learning analytics,
retrieved from OpenAlex (CC0 licence). All records have a verified title,
publication year, and at least one author. Journal names are present for
works published in a named source; preprints and book chapters may have
NA in journal.
Usage
learning_analytics
Format
A data frame with 1,508 rows and 11 columns:
- id
OpenAlex work ID (e.g.
"W2769342982").- title
Work title.
- year
Publication year (integer).
- journal
Source name, or
NAif not available.- doi
DOI string without the
https://doi.org/prefix, orNA.- cited_by_count
Number of citing works as recorded in OpenAlex.
- type
Work type (
"article","review","preprint","book-chapter", etc.).- authors
List-column of author display names (pipe-split from the OpenAlex flat export; one name per authorship slot).
- keywords
List-column with one element: the primary OpenAlex topic for the work (e.g.
"Online Learning and Analytics").- affiliations
List-column of institution display names (one entry per authorship–institution pair).
- countries
List-column of two-letter ISO country codes (one entry per authorship–institution pair).
Source
OpenAlex https://openalex.org, CC0 licence.
Examples
data(learning_analytics)
author_network(learning_analytics, "collaboration")
country_network(learning_analytics, "collaboration")
Compute local citation scores
Description
Counts how many times each document is cited by other documents within the dataset.
Usage
local_citations(
data,
references = "references",
sep = ";",
strip_quotes = TRUE,
id = NULL
)
Arguments
data |
A data frame with |
references |
Character. Name of the column containing cited references.
Default |
sep |
Character. Separator used to split the references column when it is a plain
character column. Default |
strip_quotes |
Logical. If |
id |
Optional. Name of the column to use as the work identifier.
If |
Value
A data frame with columns:
idDocument identifier.
lcsLocal Citation Score: times cited within the dataset.
gcsGlobal Citation Score:
cited_by_countif available.
Plus any metadata columns present in the input (title, year,
journal, doi).
Examples
data(biblio_data)
local_citations(biblio_data)
Convert a sparse square matrix to a tidy edge list
Description
Extracts non-zero entries directly from the sparse representation — no dense matrix allocation. For undirected networks, only the upper triangle is returned.
Usage
mat_to_edgelist(A, directed = FALSE)
Arguments
A |
A square sparse or dense matrix. |
directed |
Logical. If |
Value
A data frame with columns from, to, weight, sorted by
descending weight.
Construct a co-occurrence network via two-mode multiplication
Description
The unified engine for all bibliometric networks. Operates entirely in sparse representation — never allocates a dense n x n matrix.
Usage
multiply_bipartite(
B,
mode = "columns",
similarity = "none",
threshold = 0,
top_n = NULL,
self_loops = FALSE
)
Arguments
B |
A sparse bipartite matrix (works x entities), already weighted by the counting method. |
mode |
Character. |
similarity |
Character. Normalization method (see |
threshold |
Numeric. Minimum edge weight to retain. |
top_n |
Integer or |
Value
A data frame with columns from, to, weight, count.
Normalize a co-occurrence matrix
Description
Applies a similarity normalization to a square co-occurrence matrix. The diagonal of the input matrix is used as the total occurrence count for each item. Operates entirely in sparse representation.
Usage
normalize(A, method = "none")
Arguments
A |
A square symmetric matrix (dense or sparse) representing co-occurrence counts. |
method |
Character. Normalization method:
|
Value
A normalized sparse matrix of the same dimensions.
Examples
# Create a small co-occurrence matrix
A <- matrix(c(10, 3, 1, 3, 8, 2, 1, 2, 5), nrow = 3,
dimnames = list(c("a", "b", "c"), c("a", "b", "c")))
normalize(A, "association")
normalize(A, "cosine")
normalize(A, "jaccard")
Reorder and parse author names
Description
Converts author names to "First Last" order and breaks each name
into its components. The parser is aware of nobiliary particles
(van, von, de, del, da, der, ...) and generational
suffixes (Jr, Sr, II, III, IV), and is case-insensitive so
it handles bibnets' uppercased entity labels.
Usage
parse_names(
x,
format = c("first_last", "last_initials", "last"),
surname_first = c("auto", "yes", "no")
)
Arguments
x |
Character vector of author names, one name per element.
|
format |
Output style for personal names (group/corporate
authors,
|
surname_first |
How to read comma-less strings (strings with
a comma are always
May also be given as the logical |
Details
Three name conventions are recognised:
-
"Last, First"(a comma) — always parsed as surname-then-given. -
"SURNAME Initials"(no comma, e.g."WANG Y","AYALA-ROMERO JA") — the Scopus / bibnets author-label form. -
"First Last"(no comma, e.g."Mohammed Saqr").
Comma-less strings that look like group or corporate authors
(e.g. "WHO Collaborating Group") are detected and left untouched, as
are NA and empty strings.
This is an optional, standalone utility. No reader or network builder
in bibnets calls it; entity labels are matched verbatim unless you
choose to apply this function yourself first.
Value
A character vector the same length as x, formatted per
format. The parsed components are attached as the attribute
"parts" (independent of format): a data frame with columns
original, first, last, particle, suffix, and type (one
of "person", "organization", "empty", "missing"). Casing of
the input is preserved; periods are stripped from parsed initials.
Input shape
parse_names() takes a flat character vector (one name per
element) — not a data frame and not a list. bibnets readers store
authors as a list-column (each element is a character vector,
because a paper has a variable number of authors), so map the
function over it rather than passing the column directly:
df$authors <- lapply(df$authors, parse_names, format = "last_initials")
For an ordinary flat character column (or the from / to columns of
a bibnets_network), call it directly: parse_names(df$col).
Recommended workflow
Normalise names before building a network, on the reader's
authors list-column. Node identity in bibnets is fixed when the
bipartite matrix is built (labels are upper-cased and matched
verbatim), so two spellings of one author ("Saqr, Mohammed" and
"SAQR M") only merge into a single node if they are normalised
before author_network() is called:
d <- read_biblio("scopus.csv")
d$authors <- lapply(d$authors, parse_names, format = "last_initials")
net <- author_network(d, type = "collaboration")
Applying to an existing edgelist
You can call parse_names() on the from / to (or
source / target) columns of a built network, but it is a
per-column, graph-blind relabelling: edges, pairing, weight and
count are preserved, but
apply the same call to both endpoint columns or the two ends use different labels;
the mapping is many-to-one, so distinct authors can collapse onto one label (especially with
"last_initials"), and bibnets does not re-aggregate the resulting duplicate edges.
Prefer the pre-build workflow above.
Limitations
Comma-less names are inherently ambiguous; the auto heuristic is
biased toward the bibnets/Scopus surname-first convention and may
misread uppercase "GIVEN SURNAME" where the surname is 1-3 letters
(e.g. "MOHAMMED LI"). Suffix-first garbage
("Jr., Sammy Davis") is not specially handled. Use surname_first
to force interpretation when you know the source convention.
See Also
author_network() and read_biblio() for the upstream
stage where normalisation is best applied.
Examples
parse_names(c("Saqr, Mohammed", "Lopez-Pernas, Sonsoles"))
# Alternative output styles
parse_names("Saqr, Mohammed", format = "last_initials") # "Saqr M."
parse_names("Saqr, Mohammed", format = "last") # "Saqr"
parse_names("van der Berg, Jan", format = "last_initials") # "van der Berg J."
x <- parse_names("Saqr, M.")
x
attr(x, "parts")
# Particles and suffixes
parse_names(c("van der Berg, Jan", "Smith, John, Jr.", "de la Cruz, Ana"))
# Scopus / bibnets surname-first labels are detected automatically
parse_names(c("WANG Y", "AYALA-ROMERO JA", "VAN DER BERG J"))
parse_names("WANG Y", format = "last_initials") # "WANG Y."
# Override the auto heuristic when you know the convention
parse_names("Wang Yong", surname_first = "yes") # "Yong Wang"
# Group authors are detected and left unchanged
parse_names("WHO Collaborating Group")
# Recommended workflow: normalise the authors list-column, then build
papers <- data.frame(id = c("P1", "P2", "P3"), stringsAsFactors = FALSE)
papers$authors <- list(
c("Saqr, Mohammed", "Lopez, Ana"),
c("SAQR M", "Lopez, Ana"),
c("Saqr, Mohammed", "Chen, Wei"))
papers$authors <- lapply(papers$authors, parse_names,
format = "last_initials")
net <- author_network(papers, type = "collaboration")
net
List of all available counting methods
Description
List of all available counting methods
Usage
position_independent_counts()
Value
Character vector of method names.
Print a bibnets network edge list
Description
Print a bibnets network edge list
Usage
## S3 method for class 'bibnets_network'
print(x, n = 10L, ...)
Arguments
x |
A |
n |
Integer. Number of rows to show. Default 10. |
... |
Ignored. |
Value
Invisibly returns x.
Examples
data(biblio_data)
edges <- author_network(biblio_data, "collaboration")
print(edges)
Prune a weighted edge list
Description
Reduces a weighted edge list by removing weak or excess edges.
Usage
prune(edges, threshold = NULL, top_n = NULL)
Arguments
edges |
A data frame with at least columns |
threshold |
Numeric. Keep only edges with |
top_n |
Integer. For each node, keep only its |
Value
The filtered edge data frame (same columns as input).
Examples
edges <- data.frame(
from = c("A","A","A","B","B","C"),
to = c("B","C","D","C","D","D"),
weight = c(5, 1, 2, 4, 1, 3)
)
# Keep only edges with weight >= 3
prune(edges, threshold = 3)
# Keep the 2 strongest edges per node
prune(edges, top_n = 2)
Read bibliometric data
Description
Universal reader that handles files, folders, format detection, and generic CSV input. Accepts a single file, multiple files, or a directory.
Usage
read_biblio(
path,
format = "auto",
id = NULL,
authors = NULL,
keywords = NULL,
references = NULL,
countries = NULL,
affiliations = NULL,
journal = NULL,
sep = ";",
list_cols = NULL,
...,
actors = NULL
)
Arguments
path |
Character. Path to a file, a vector of file paths, or a directory containing export files. |
format |
Character. File format:
|
id |
Character. Column name for document identifier. Only used
when |
authors, keywords, references, countries, affiliations |
Character. For
|
journal |
Character. For |
sep |
Character. Delimiter for splitting the mapped multi-valued
columns. Default |
list_cols |
Character vector. For |
... |
Additional arguments passed to the format-specific reader. |
actors |
Deprecated. Use the entity arguments ( |
Value
A data frame.
Examples
# Auto-detect format from file content (here: a bundled OpenAlex CSV)
f <- system.file("extdata", "openalex_works.csv", package = "bibnets")
data <- read_biblio(f)
head(data[, c("id", "title", "year", "journal")])
# Read multiple files at once; auto-detects each format
f_scopus <- system.file("extdata", "scopus_sample.csv", package = "bibnets")
f_wos <- system.file("extdata", "wos_sample.txt", package = "bibnets")
combined <- read_biblio(c(f_scopus, f_wos))
head(combined[, c("id", "title", "year", "journal")])
# Read every supported export in a directory (here: the bundled extdata)
folder <- system.file("extdata", package = "bibnets")
all_data <- read_biblio(folder)
nrow(all_data)
# Custom CSV: map each source column onto a standard field by name.
# Naming columns implies format = "generic" (no need to pass it).
tmp <- tempfile(fileext = ".csv")
write.csv(data.frame(
doc_id = c("a", "b"),
Authors = c("Smith J; Jones A", "Davis M"),
Keywords = c("networks; bibliometrics", "analytics")
), tmp, row.names = FALSE)
generic <- read_biblio(tmp,
id = "doc_id",
authors = "Authors",
keywords = "Keywords",
sep = ";")
head(generic)
Read a BibTeX file
Description
Parses a .bib file into a standardized bibliometric data frame.
Note: standard BibTeX does not contain cited references, so the
references column will be empty unless the file includes a
non-standard cited-references or note field with reference data.
Usage
read_bibtex(file, encoding = "UTF-8")
Arguments
file |
Path to a |
encoding |
Character. File encoding. Default |
Value
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references (typically empty for
BibTeX), and keywords.
Examples
# Write a minimal BibTeX entry to a temp file, then read it
bib <- '@article{smith2020,
title = {Bibliometric networks},
author = {Smith, J. and Jones, K.},
journal = {Test Journal},
year = {2020},
doi = {10.1000/test}
}'
f <- tempfile(fileext = ".bib")
writeLines(bib, f)
data <- read_bibtex(f)
data[, c("id", "title", "year", "journal", "doi")]
unlink(f)
Convert Crossref API data to bibnets format
Description
Takes the output of rcrossref::cr_works() (the $data tibble/data frame)
and converts it to the standardized bibnets format.
Usage
read_crossref(data)
Arguments
data |
A data frame from |
Value
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
Examples
# Construct a minimal data frame matching the structure of
# rcrossref::cr_works(...)$data. In practice, pass that data frame directly.
raw <- data.frame(
doi = c("10.1/a", "10.2/b"),
title = c("First paper", "Second paper"),
issued = c("2022-01-01", "2021-06-15"),
container.title = c("Journal A", "Journal B"),
is.referenced.by.count = c("3", "9"),
type = c("journal-article", "journal-article"),
stringsAsFactors = FALSE
)
raw$author <- list(
data.frame(given = c("Jane", "Anne"),
family = c("Smith", "Jones"),
stringsAsFactors = FALSE),
data.frame(given = "Mark", family = "Davis", stringsAsFactors = FALSE)
)
data <- read_crossref(raw)
head(data[, c("id", "title", "year", "journal")])
Read Dimensions CSV export
Description
Parses a CSV file exported from Dimensions into a standardized bibliometric data frame.
Usage
read_dimensions(file, encoding = "UTF-8")
Arguments
file |
Path to a Dimensions CSV export file. |
encoding |
Character. File encoding. Default |
Value
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
Dimensions-specific extras: affiliations (list-column),
countries (list-column).
Examples
f <- system.file("extdata", "dimensions_sample.csv", package = "bibnets")
data <- read_dimensions(f)
head(data[, c("id", "title", "year", "journal")])
Read a generic CSV with user-specified columns
Description
Read a generic CSV with user-specified columns
Usage
read_generic(
file,
id = NULL,
sep = ";",
authors = NULL,
keywords = NULL,
references = NULL,
countries = NULL,
affiliations = NULL,
journal = NULL,
list_cols = NULL
)
Read Lens.org CSV export
Description
Parses a CSV file exported from Lens.org into a standardized bibliometric data frame.
Usage
read_lens(file, encoding = "UTF-8")
Arguments
file |
Path to a Lens.org CSV export file. |
encoding |
Character. File encoding. Default |
Value
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
Examples
f <- system.file("extdata", "lens_sample.csv", package = "bibnets")
data <- read_lens(f)
head(data[, c("id", "title", "year", "journal")])
Convert OpenAlex data to bibnets format
Description
Takes the output of openalexR::oa_fetch() (a tibble/data frame of works)
and converts it to the standardized bibnets format with list-columns.
Usage
read_openalex(data)
Arguments
data |
A data frame from |
Value
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
Examples
# Construct a minimal data frame matching the structure returned by
# openalexR::oa_fetch(entity = "works", ...). In practice, pass the
# result of oa_fetch() directly.
raw <- data.frame(
id = c("W123", "W456"),
display_name = c("First paper", "Second paper"),
publication_year = c(2022L, 2021L),
so = c("Journal A", "Journal B"),
doi = c("https://doi.org/10.1/a", "https://doi.org/10.2/b"),
cited_by_count = c(5L, 12L),
stringsAsFactors = FALSE
)
raw$author <- list(
data.frame(au_display_name = c("Smith J", "Jones A"),
stringsAsFactors = FALSE),
data.frame(au_display_name = "Davis M", stringsAsFactors = FALSE)
)
raw$referenced_works <- list(c("W100", "W200"), "W123")
data <- read_openalex(raw)
head(data[, c("id", "title", "year", "journal", "doi")])
Read a flat OpenAlex CSV export
Description
Reads the flat CSV format downloaded directly from the OpenAlex website
(openalex.org/works exports). Multi-value fields are pipe-delimited (|).
This is distinct from the nested tibble produced by openalexR::oa_fetch(),
which is handled by read_openalex().
Usage
read_openalex_csv(file, sep = "|")
Arguments
file |
Path to the CSV file. |
sep |
Character. Delimiter for multi-value fields. Default |
Value
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, keywords, affiliations,
countries. abstract and references are always NA / empty
(not available in the flat export).
Examples
f <- system.file("extdata", "openalex_works.csv", package = "bibnets")
data <- read_openalex_csv(f)
Read an RIS file
Description
Parses a .ris file into a standardized bibliometric data frame.
Like BibTeX, standard RIS does not include cited references.
Usage
read_ris(file, encoding = "UTF-8")
Arguments
file |
Path to a |
encoding |
Character. File encoding. Default |
Value
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references (typically empty for
RIS), and keywords.
Examples
# Write a minimal RIS record to a temp file, then read it
ris <- "TY - JOUR
AU - Smith, J.
AU - Jones, K.
TI - Bibliometric networks
JO - Test Journal
PY - 2020
DO - 10.1000/test
ER - "
f <- tempfile(fileext = ".ris")
writeLines(ris, f)
data <- read_ris(f)
data[, c("id", "title", "year", "journal", "doi")]
unlink(f)
Read Scopus CSV export
Description
Parses a CSV file exported from Scopus into a standardized bibliometric data frame with list-columns for multi-valued fields.
Usage
read_scopus(file, encoding = "UTF-8")
Arguments
file |
Path to a Scopus CSV export file. |
encoding |
Character. File encoding. Default |
Value
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
Scopus-specific extras: index_keywords (list-column),
affiliations (character), language (character).
Examples
f <- system.file("extdata", "scopus_sample.csv", package = "bibnets")
data <- read_scopus(f)
head(data[, c("id", "title", "year", "journal")])
Read a single bibliometric file
Description
Read a single bibliometric file
Usage
read_single_biblio(
file,
format,
id,
sep,
authors = NULL,
keywords = NULL,
references = NULL,
countries = NULL,
affiliations = NULL,
journal = NULL,
list_cols = NULL,
...
)
Read Web of Science plaintext or tab-delimited export
Description
Parses a Web of Science export file (plaintext or tab-delimited) into a standardized bibliometric data frame.
Usage
read_wos(file, format = "plaintext")
Arguments
file |
Path to a WoS export file (.txt). |
format |
Character. |
Value
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
WoS-specific extra: keywords_plus (list-column).
Examples
f <- system.file("extdata", "wos_sample.txt", package = "bibnets")
data <- read_wos(f)
head(data[, c("id", "title", "year", "journal")])
Build a reference network
Description
Constructs a co-citation or equivalence network among cited references. Two references are linked when they are cited together by the same paper.
Usage
reference_network(
data,
type = "co_citation",
counting = "full",
similarity = "none",
threshold = 0,
min_occur = 1L,
top_n = NULL,
self_loops = FALSE,
deduplicate = TRUE,
format = "edgelist",
references = "references",
sep = ";",
strip_quotes = TRUE,
id = NULL
)
Arguments
data |
A data frame with |
type |
Character. |
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum times a reference must be cited. Default 1. |
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
references |
Character. Name of the column containing cited references.
Default |
sep |
Character. Separator used to split the entity column when it
is a plain character column rather than a list-column, e.g. |
strip_quotes |
Logical. If |
id |
Optional. Name of the column to use as the work identifier
(the matrix-row dimension). If |
Value
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
Examples
data(biblio_data)
reference_network(biblio_data)
reference_network(biblio_data, similarity = "association")
Resolve the work-identifier column
Description
Materializes a top-level id column that the network pipeline uses to
index works (matrix rows). Resolution rules:
Usage
resolve_id(data, id = NULL)
Arguments
data |
A data frame. |
id |
|
Details
-
id = NULL(default): use the existingidcolumn if one is present, otherwise fall back to row numbers (seq_len(nrow(data))). -
id = "colname": copy the named column toid. The column must exist.
When id names a column other than "id" and the data already has a
distinct "id" column, the request is ambiguous (the existing "id"
column might itself be an entity field). Rather than silently overwriting
it, this errors and asks the caller to resolve the conflict.
Value
data with a guaranteed character id column.
Resolve file paths from a file, vector of files, or directory
Description
Resolve file paths from a file, vector of files, or directory
Usage
resolve_paths(path)
Scopus dataset — Green Cloud Computing and Quantization (2020–2025)
Description
First 500 records from a Scopus bibliometric export on the intersection of green cloud computing and quantization, covering 2020–2025. Includes full references, author keywords, index keywords, and affiliations.
Usage
scopus_quantum_cloud
Format
A data frame with 499 rows and 12 columns:
- id
Scopus EID.
- title
Work title.
- year
Publication year (integer).
- journal
Source title.
- doi
DOI string without the
https://doi.org/prefix.- cited_by_count
Times cited in Scopus.
- abstract
Abstract text.
- type
Document type (
"Article","Review", etc.).- authors
List-column of author name strings.
- references
List-column of cited reference strings.
- keywords
List-column of author keywords.
- affiliations
List-column of affiliation strings.
Source
Scopus bibliometric export. Dataset archived at doi:10.5281/zenodo.17142636 (CC BY 4.0).
Examples
data(scopus_quantum_cloud)
author_network(scopus_quantum_cloud, "collaboration")
keyword_network(scopus_quantum_cloud)
document_network(scopus_quantum_cloud, "coupling", similarity = "cosine")
Build a source (journal) network
Description
Constructs a network between publication sources (journals, book series).
Usage
source_network(
data,
type = "coupling",
counting = "full",
similarity = "none",
threshold = 0,
min_occur = 1L,
top_n = NULL,
self_loops = FALSE,
deduplicate = TRUE,
format = "edgelist",
journal = "journal",
sep = ";",
references_sep = ";",
strip_quotes = TRUE,
id = NULL
)
Arguments
data |
A data frame with |
type |
Character. |
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum papers per source. Default 1. |
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
journal |
Character. Name of the column containing the publication
source. Default |
sep |
Character. Separator used to split the entity column when it
is a plain character column rather than a list-column, e.g. |
references_sep |
Character. Separator for the |
strip_quotes |
Logical. If |
id |
Optional. Name of the column to use as the work identifier
(the matrix-row dimension). If |
Value
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
Examples
data(biblio_data)
source_network(biblio_data, "coupling")
Parse semicolon-delimited strings into list-column
Description
Splits semicolon-separated strings (common in Scopus/WoS exports) into character vectors, trimming whitespace.
Usage
split_field(x, sep = ";")
Arguments
x |
A character vector of semicolon-delimited strings. |
sep |
Character. Delimiter. Default |
Value
A list of character vectors.
Examples
split_field(c("Alice; Bob; Carol", "Dave; Eve"))
Standardize author names
Description
Uppercase, whitespace normalisation, and dot removal from initials
(F.J. → FJ). Name order and format are preserved — consistent with
how bibliometrix handles multi-source data.
Usage
standardize_authors(x, flip_names = FALSE)
Arguments
x |
Character vector of author names. |
flip_names |
Logical. If |
Value
Character vector, uppercased and cleaned.
Strip surrounding quote characters from entity labels
Description
Removes leading/trailing double-quote characters (straight ", the CSV
doubled "", and curly quotes) plus surrounding whitespace, so quoted
values such as "Alice" or ""Bob"" become Alice / Bob. Quotes
inside a label (e.g. an apostrophe in O'Brien) are left untouched.
Usage
strip_surrounding_quotes(x)
Arguments
x |
Character vector. |
Value
Character vector with surrounding quotes/whitespace removed.
Summarise a bibnets network
Description
Summarise a bibnets network
Usage
## S3 method for class 'bibnets_network'
summary(object, ...)
Arguments
object |
A |
... |
Ignored. |
Value
Invisibly returns object.
Examples
data(biblio_data)
edges <- author_network(biblio_data, "collaboration")
summary(edges)
Build time-windowed networks
Description
Splits data by time windows and builds a separate network for each window using any network function.
Usage
temporal_network(
data,
network_fun,
...,
window = 3,
step = NULL,
strategy = "fixed",
time_col = "year"
)
Arguments
data |
A data frame with a numeric time column. |
network_fun |
Function or character string naming a network function
(e.g., |
... |
Additional arguments passed to |
window |
Integer. Width of each time window in units of the time column (years, months, quarters, etc.). Default 3. |
step |
Integer or |
strategy |
Character. Time window strategy:
|
time_col |
Character. Name of the column containing the time variable.
Default |
Value
A named list of data frames (edge lists). Names are window
labels like "2018-2020".
Examples
data(biblio_data)
# Fixed 3-year windows
temporal_network(biblio_data, author_network, "collaboration")
# Sliding window
temporal_network(biblio_data, author_network, "collaboration",
window = 2, strategy = "sliding")
# Cumulative
temporal_network(biblio_data, reference_network,
threshold = 0, strategy = "cumulative", window = 2)
# With string name
temporal_network(biblio_data, "keyword_network", window = 3)
Prepare network for cograph::splot()
Description
Converts a bibnets edge list to a cograph_network object by calling
cograph::as_cograph(). Optionally merges node metadata (e.g., from
local_citations()) into the network's node table so attributes like
lcs or year can be used directly in splot() aesthetic parameters
(e.g., node_size = "lcs").
Usage
to_cograph(edges, nodes = NULL, directed = FALSE)
Arguments
edges |
A data frame with at least |
nodes |
Optional data frame of node attributes with an |
directed |
Logical. Default |
Details
Note: bibnets edge lists (from, to, weight) are accepted directly
by cograph::splot() without conversion. This function is only needed
when you want to attach node-level metadata.
Value
A cograph_network object (S3 list with $nodes and $edges).
Examples
data(biblio_data)
# Without metadata: splot() accepts bibnets edges directly
edges <- author_network(biblio_data, "collaboration")
# With metadata: document network + local citation scores as node size
edges <- document_network(biblio_data, type = "coupling")
nodes <- local_citations(biblio_data) # keyed by document id
net <- to_cograph(edges, nodes = nodes)
Export to Gephi node and edge tables
Description
Converts a bibnets edge list (and optional node table) to the CSV format
expected by Gephi's Data Laboratory. Column names are remapped to Gephi
conventions (Source, Target, Weight, Id, Label).
Usage
to_gephi(edges, nodes = NULL, file = NULL, directed = FALSE)
Arguments
edges |
A data frame with at least |
nodes |
Optional data frame of node attributes. Must contain an |
file |
Optional directory path. If supplied, writes |
directed |
Logical. Sets the |
Value
If file = NULL: a list with $nodes and $edges data frames.
If file is a directory path: writes two CSV files invisibly and returns
the file paths.
Examples
data(biblio_data)
edges <- author_network(biblio_data, "collaboration")
gephi <- to_gephi(edges)
head(gephi$edges)
Export to GraphML
Description
Writes a bibnets edge list (and optional node attributes) to a GraphML file using pure base R — no XML package required.
Usage
to_graphml(edges, nodes = NULL, file = NULL, directed = FALSE)
Arguments
edges |
A data frame with at least |
nodes |
Optional data frame of node attributes with an |
file |
File path to write. If |
directed |
Logical. Default |
Value
If file = NULL: GraphML as a character string. Otherwise writes
the file and returns the path invisibly.
Examples
data(biblio_data)
edges <- keyword_network(biblio_data)
xml <- to_graphml(edges)
cat(substr(xml, 1, 300))
Convert edge data frame to igraph
Description
Convert edge data frame to igraph
Usage
to_igraph(edges, directed = FALSE)
Arguments
edges |
A data frame with at least |
directed |
Logical. Default |
Value
An igraph graph object.
Examples
data(biblio_data)
edges <- author_network(biblio_data, "collaboration")
g <- to_igraph(edges)
Convert edge data frame to adjacency matrix
Description
Convert edge data frame to adjacency matrix
Usage
to_matrix(edges, symmetric = TRUE)
Arguments
edges |
A data frame with |
symmetric |
Logical. If |
Value
A sparse Matrix.
Examples
data(biblio_data)
edges <- reference_network(biblio_data, min_occur = 2)
to_matrix(edges)
Convert edge data frame to tbl_graph
Description
Convert edge data frame to tbl_graph
Usage
to_tbl_graph(edges, directed = FALSE)
Arguments
edges |
A data frame with at least |
directed |
Logical. Default |
Value
A tbl_graph object (tidygraph).
Examples
data(biblio_data)
edges <- keyword_network(biblio_data)
tg <- to_tbl_graph(edges)
Warn when a separator likely failed to split a multi-entity column
Description
Splitting with the wrong separator silently yields one "entity" per row (e.g., a whole author byline as a single node). Heuristic: no row split into more than one entity, yet most non-empty strings contain a common structural delimiter.
Usage
warn_if_sep_mismatch(col, parts, field, sep)
Details
Only structural delimiters (";", "|", tab) are considered, because
they essentially never occur inside a single legitimate label. Commas
and " and " are deliberately excluded: they appear inside valid
single values (e.g. "Last, First" author names, one-reference-per-row
citation strings, or organisations like "Smith and Sons"), so warning
on them would mislead users with correct data.