New function bold_identify_taxonomy()
to add
taxonomic information to the output of bold_identify()
and
replace bold_identify_parents()
. Instead of taking the
taxon names from the bold_identify()
output, and use
bold_tax_name()
to get the taxonomic ID to then pass it to
bold_tax_id()
to get the parent names, we take the process
ids from the bold_identify()
output and then pass them to
bold_specimens()
. This has the advantages of being faster
and, more importantly, making sure the correct taxonomy is returned. The
function has less arguments since the filtering of the result isn’t
necessary anymore. Since the result now has only one line per row of
input, the output is always in ‘wide’ format (like when using
bold_identify_parents()
with wide=TRUE
). There
is one new argument taxOnly
which is TRUE
by
default and return only the taxonomic data. However, since
bold_specimens()
also returns other data (habitat, country,
image_url, etc), setting this argument to FALSE
will also
join that data to the input.
New function bold_tax_id2()
which will eventually
replace bold_tax_id()
. The main changes are in the format
of the output. For the dataTypes
‘basic’, ‘stats’, ‘images’
and ‘thirdparty’, the output doesn’t change. For the
dataTypes
‘sequencinglabs’, ‘geo’ and ‘depository’, instead
of having one (sometimes very) wide data.frame, the result is now in
‘long’ format, with the columns ‘input’, ‘taxid’,
‘sequencinglabs|country|depository’ and ‘count’. For the
dataTypes
‘all’ or when selecting more than one dataTypes,
the output is a list for each data types containing their respective
data.frame. When setting includeTree to TRUE
, the parents’
data is rbinded to their respective data.frame. The function also check
that all arguments are the correct type and that the
dataTypes
chosen are valid.
The now deprecated bold_tax_id()
has the same
argument checks as bold_tax_id2()
but will throw warnings
instead of errors to not affect existing workflows. Also, if a chosen
dataTypes
is invalid, it gets removed to not make
unnecessary requests.
Similarly, the now deprecated
bold_identify_parents()
has new argument checks and will
throw warnings to not affect existing workflows.
For bold_tax_id2()
and bold_tax_name()
,
when querying multiple taxa, if one fails, the loop won’t break and will
instead throw the API error as a warning. The ouput object will also
have 2 new attributes “errors” and “params” that will let you see what
errors occured for with request and what parameters were use for the
request. To make it easy to retrieve these attributes, 3 new functions
have been created:
bold_get_attr()
will return a list of the two
attributesbold_get_errors()
will return a list of the errorsbold_get_params()
will return a list of parameters
usedbold_specimens()
and bold_seqspec()
have a new parameter cleanData
which, when set to
TRUE
, replaces empty strings (““) by NAs and strings
containing only duplicated values by their unique value (ex
:”COI-5P|COI-5P|COI-5P” becomes “COI-5P”).
New function bold_read_trace()
to replace
read_trace()
. Can read one or multiple trace files from a
boldtrace
object or provided file path(s).
New function b_sepFasta()
to use after a call to
bold_seqspec()
where sepFasta
wasn’t set to
TRUE
.
bold_trace()
functionbold_specimens()
and bold_seqspec()
can
now also return partial output like bold_seq()
data.table
when possible, removed
dplyr
and reshape
dependenciesstringi
instead of stringr
which
removed stringr
’s other dependenciesbold_seq()
,
bold_seqspec()
and bold_specimen()
that if the
taxon
doesn’t have public records, if using another
parameter will return all data for that parameter. Users can verify the
availability of public records with bold_stats()
. A note
was also added in bold_tax_name()
that the column
‘specimenrecords’ relate to the records in the taxonomy browser and not
in the public data portal. (#76)bold_tax_id()
(#83).
Added a line in the function to change ‘depositories’ to ‘depository’ in
case people had been using that.bold_tax_name()
to double
escape single quotes. Otherwise it doesn’t return the data (#84, #85).
Since it’s related to the API, this means that the data that comes back
also contains errors. So I added a function to repair the names of
‘taxon’, ‘taxonrep’ and ‘parentname’ in the returned object. The
function is also used in pipe_params()
(which is used by
bold_seq()
, bold_seqspec()
and
bold_specimen()
) to repair the taxon
parameter
in case users use results from previous versions.bold_seqspec()
is read
(#87, #88) thanks @cjfieldsbold_stats()
documentation to specify
that the record counts include all gene markers (#90).bold_seqspec()
- we
now set the encoding to “UTF-8” before parsing the string to XML
(#71)bold_seqspec()
fix: capture “Fatal errors” returned by
BOLD servers and pass that along to the user with advice (#66)bold_seq()
and bold_seqspec()
. the marker
section details that the marker parameter doesn’t actually filter
results that you get - but you can filter them yourself. the large
requests section gives some caveats associated with large data requests
and outlines how to sort it out (#61)bold_identify_parents()
(#64)sangerseqR
-
instructions depend on which version of R is being used (#65) thanks
@KevCaz_R_CHECK_LENGTH_1_LOGIC2_
(#57)bold_identify()
fix: ampersands needed to be escaped
(#62) thanks @devonorourkevcr
to
cache responses, speeds up tests significantly, and no longer relies on
an internet connection (#55) (#56)bold_seq()
: sometimes on large requests, the BOLD
servers time out, and give back partial output but don’t indicate that
there was an error. We catch this kind of error now, throw a message for
the user, and the function gives back the partial output given by the
server. Also added to the documentation for bold_seq()
and
in the README that if you run into this problem try to do many queries
that will result in smaller set of results instead of one or fewer
larger queries (#52) (#53)bold_seq()
: remove return characters (\r
and \n
) from sequences (#54)bold_identify_parents()
gains many new parameters
(taxid
, taxon
, tax_rank
,
tax_division
, parentid
,
parentname
, taxonrep
,
specimenrecords
) to filter parents based on any of a number
of fields - should solve problem where multiple parents found for a
single taxon, often in different kingdoms (#50)bold_identify()
that the function
uses lapply
internally, so queries with lots of sequences
can take a long timebold_specimens()
: use rawToChar()
on
raw bytes instead of parse()
from crul
(#47)crul
for HTTP requests. Only really affects
users in that specifying curl options works slightly differenlty
(#42)marker
parameter in bold_seqspec
was and
maybe still is not working, in the sense that using the parameter
doesn’t always limit results to the marker you specify. Not really fixed
- watch out for it, and filter after you get results back to get markers
you want. (#25)bold_identify_parents
- was failing when
no match for a parent name. (#41) thx @VascoElbrechttsv
results were erroring in
bold_specimens
and other fxns (#46) - fixed by switching to
new BOLD v4 API (#30)stats
and
utils
- replaced is
with inherits
(#39)bold_identify_parents()
to add taxonomic
information to the output of bold_identify()
. We take the
taxon names from bold_identify
output, and use
bold_tax_name
to get the taxonomic ID, passing it to
bold_tax_id
to get the parent names, then attaches those to
the input data. There are two options given what you put for the
wide
parameter. If TRUE
you get data.frames of
the same dimensions with parent rank name and ID as new columns (for
each name going up the hierarchy) - while if FALSE
you get
a long data.frame. thanks @dougwyu for inspiring this (#36)xml2::xml_find_one
with
xml2::xml_find_first
(#33)db
options in
bold_identify
man file - COX1 and COX1_SPECIES were
switched (#37) thanks for pointing that out @dougwyubold_tax_id
for when some elements returned from
the BOLD API were empty/NULL
(#32) thanks @fmichonneau !!xml2
from XML
as the XML
parser for this package (#26)bold_trace()
to create dir and tar file when
it doesn’t already existcontent(x, "text")
, so now using
rawToChar(content(x))
, which works (#24)sangerseqR
package now in Suggests for reading trace
files, and is only used in bold_trace()
function.bold_trace()
gains two new parameters:
overwrite
to choose whether to overwrite an existing file
of the same name or not, progress
to show a progress bar
for downloading or not.bold_trace()
gains a print method to show a tidy
summary of the trace file downloaded.bold_tax_name()
(#17) and
bold_tax_id()
(#18) in which species that were missing from
the BOLD database returned empty arrays but 200 status codes. Parsing
those as failed attempts now. Also fixes problem in taxize in
bold_search()
that use these two functions.bold_tax_name()
and bold_tax_id()
, which
search for taxonomic data from BOLD using either names or BOLD
identifiers, respectively. (#11)jsonlite
and
reshape
.callopts
parameter changed to ...
throughout the package, so that passing on options to
httr::GET
is done via named parameters, e.g.,
config=verbose()
. (#13)httr
(v0.4) (#9),
and added a few more tests (#7)