---
title: "Getting started with scopusflow"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with scopusflow}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

```{r setup}
library(scopusflow)
```

This vignette is fully reproducible without a Scopus API key. It draws on a small
static fixture bundled with the package, so the whole workflow can be shown
offline. The few steps that genuinely need the API are shown but not run.

## Describing a search as a plan

A plan separates describing a search from executing it. Plans are inspectable,
saveable and version-controllable, and they can be partitioned, for example by
year, so that a large retrieval stays under the API's `start < 5000` ceiling and
can be cached and resumed.

```{r}
plan <- scopus_plan(
  "machine translation",
  years     = 2018:2020,
  field     = "TITLE-ABS-KEY",
  partition = "year"
)
plan
```

Each row is one query cell. Field tags wrap the query and years become a date
filter:

```{r}
scopus_plan("language learning", field = "TITLE")$query
scopus_plan("x", years = 2015:2020)$date
```

## Sizing and fetching

With a key configured, you size a search cheaply and then execute the plan,
optionally caching each cell so that an interrupted run resumes without
re-spending quota. These contact the API, so they are not evaluated here:

```{r eval = FALSE}
scopus_count("machine translation", years = 2018:2020, field = "TITLE-ABS-KEY")

records <- scopus_fetch_plan(plan, cache_dir = scopus_cache_dir(), resume = TRUE)
```

## The record schema

Whether records come from the API or from the bundled example data, they share
one stable schema. The package ships a small, already normalised set, which we
use here to continue offline:

```{r}
records <- example_records
records
```

`scopus_records()` produces this same shape from a raw API response, flattening
the nested result into one row per record.

## DOIs and change tracking

Extract a clean, deduplicated DOI list for import into a reference manager, and
compare two retrievals to see exactly what changed:

```{r}
dois <- scopus_extract_dois(records)
dois

# Suppose a later retrieval added one DOI and dropped another.
later <- c(dois[-1], "10.1000/example.999")
scopus_diff_dois(old = dois, new = later)
```

You can write the DOIs to a path you specify:

```{r}
out <- file.path(tempdir(), "dois.csv")
scopus_extract_dois(records, file = out)
readLines(out)
```

## Comparing topic trends

`scopus_compare_topics()` issues one count request per term per year, so it needs
the API. Its output has a fixed shape, which we reproduce here to show the plot:

```{r eval = FALSE}
cmp <- scopus_compare_topics(
  reference_query  = "language learning",
  comparison_terms = c("effect size", "Bayesian"),
  years            = 2015:2020,
  field            = "TITLE-ABS-KEY"
)
```

```{r}
# A stand-in comparison object with the same columns scopus_compare_topics()
# returns, so the plotting step is reproducible offline.
cmp <- tibble::tibble(
  query = "q",
  query_type = rep(c("reference", "comparison", "comparison"), each = 6),
  abridged_query = rep(c("language learning", "effect size", "Bayesian"), each = 6),
  year = rep(2015:2020, 3),
  n = c(rep(100, 6), 20, 24, 30, 33, 40, 45, 5, 7, 9, 12, 15, 19),
  reference_n = rep(100, 18),
  comparison_percentage = c(rep(100, 6), 20, 24, 30, 33, 40, 45, 5, 7, 9, 12, 15, 19),
  average_comparison_percentage = rep(c(100, 32, 11.2), each = 6)
)
class(cmp) <- c("scopus_comparison", class(cmp))
cmp
```

```{r fig.alt = "Line chart of two topics' share of the reference literature over time", fig.width = 7, fig.height = 4.5}
if (requireNamespace("ggplot2", quietly = TRUE)) {
  plot_scopus_comparison(cmp)
}
```

## Export and interoperability

Hand results to `bibliometrix`-style workflows, or save and reload them:

```{r}
head(as_bibliometrix(records))

path <- file.path(tempdir(), "records.rds")
write_scopus_records(records, path)
identical(read_scopus_records(path), records)
```

## Handling failures

Network and API problems surface as typed conditions, all inheriting from
`scopus_error`, so a workflow can respond to them in code:

```{r eval = FALSE}
tryCatch(
  scopus_fetch("..."),
  scopus_error_no_key     = function(e) message("No API key configured."),
  scopus_error_rate_limit = function(e) message("Rate limited, so backing off."),
  scopus_error            = function(e) message("Scopus error: ", conditionMessage(e))
)
```