Title: | CITation ANalysis Toolpack |
---|---|
Description: | Supports quantitative research in scientometrics and bibliometrics. Provides various tools for preprocessing bibliographic data retrieved, e.g., from Elsevier's SciVerse Scopus, computing bibliometric impact of individuals, or modelling phenomena encountered in the social sciences. This package is deprecated, see 'agop' instead. |
Authors: | Marek Gagolewski [aut, cre] |
Maintainer: | Marek Gagolewski <[email protected]> |
License: | LGPL (>= 3) |
Version: | 2022.1.1 |
Built: | 2025-01-12 03:04:01 UTC |
Source: | https://github.com/gagolews/citan |
CITAN is a library of functions useful in — but not limited to — quantitative research in the field of scientometrics.
The package is deprecated, see agop instead.
For the complete list of functions, call library(help="CITAN")
.
Marek Gagolewski
Dubois D., Prade H., Testemale C. (1988). Weighted fuzzy pattern matching,
Fuzzy Sets and Systems 28, s. 313-331.
Egghe L. (2006). Theory and practise of the g-index, Scientometrics 69(1),
131-152.
Gagolewski M., Grzegorzewski P. (2009). A geometric approach to the construction of
scientific impact indices, Scientometrics 81(3), 617-634.
Gagolewski M., Debski M., Nowakiewicz M. (2009). Efficient algorithms for computing
”geometric” scientific impact indices, Research Report of Systems
Research Institute, Polish Academy of Sciences RB/1/2009.
Gagolewski M., Grzegorzewski P. (2010a). S-statistics and their basic properties,
In: Borgelt C. et al (Eds.), Combining Soft Computing and Statistical
Methods in Data Analysis, Springer-Verlag, 281-288.
Gagolewski M., Grzegorzewski P. (2010b). Arity-monotonic extended aggregation
operators, In: Hullermeier E., Kruse R., Hoffmann F. (Eds.),
Information Processing and Management of Uncertainty in Knowledge-Based
Systems, CCIS 80, Springer-Verlag, 693-702.
Gagolewski M. (2011). Bibliometric Impact Assessment with R and the CITAN Package,
Journal of Informetrics 5(4), 678-692.
Gagolewski M., Grzegorzewski P. (2011a). Axiomatic Characterizations of (quasi-)
L-statistics and S-statistics and the Producer Assessment Problem,
for Fuzzy Logic and Technology (EUSFLAT/LFA 2011), Atlantis Press, 53-58.
Grabisch M., Pap E., Marichal J.-L., Mesiar R. (2009). Aggregation functions,
Cambridge.
Gagolewski M., Grzegorzewski P. (2011b). Possibilistic analysis of arity-monotonic
aggregation operators and its relation to bibliometric impact assessment
of individuals, International Journal of Approximate Reasoning 52(9), 1312-1324.
Hirsch J.E. (2005). An index to quantify individual's scientific research output,
Proceedings of the National Academy of Sciences 102(46),
16569-16572.
Kosmulski M. (2007). MAXPROD - A new index for assessment of the scientific output
of an individual, and a comparison with the h-index, Cybermetrics 11(1).
Woeginger G.J. (2008). An axiomatic characterization of the Hirsch-index,
Mathematical Social Sciences 56(2), 224-232.
Zhang J., Stevens M.A. (2009). A New and Efficient Estimation Method for the
Generalized Pareto Distribution, Technometrics 51(3), 316-325.
Converts an object of class authorinfo
to a character string.
Such an object is returned by e.g. lbsGetInfoAuthors
.
## S3 method for class 'authorinfo' as.character(x, ...)
## S3 method for class 'authorinfo' as.character(x, ...)
x |
a single object of class |
... |
unused. |
An authorinfo
object is a list with the following components:
IdAuthor
— numeric; author's identifier in the table Biblio_Authors
,
Name
— character; author's name.
A character string
print.authorinfo
, lbsSearchAuthors
, lbsGetInfoAuthors
Converts an object of class docinfo
to a character string.
Such an object is returned by e.g. lbsGetInfoDocuments
.
## S3 method for class 'docinfo' as.character(x, ...)
## S3 method for class 'docinfo' as.character(x, ...)
x |
a single object of class |
... |
unused. |
A docinfo
object is a list with the following components:
IdDocument
— numeric; document identifier in the table Biblio_Documents
,
Authors
— list of authorinfo
objects (see e.g. as.character.authorinfo
).
Title
— title of the document,
BibEntry
— bibliographic entry,
AlternativeId
— unique character identifier,
Pages
— number of pages,
Citations
— number of citations,
Year
— publication year,
Type
— type of document, see lbsCreate
.
A character string
lbsSearchDocuments
,
as.character.authorinfo
, print.docinfo
,lbsGetInfoDocuments
Executes an SQL query and immediately frees all allocated resources.
dbExecQuery(conn, statement, rollbackOnError = FALSE)
dbExecQuery(conn, statement, rollbackOnError = FALSE)
conn |
a |
statement |
a character string with the SQL statement to be executed. |
rollbackOnError |
logical; if |
This function may be used to execute queries like CREATE TABLE
,
UPDATE
, INSERT
, etc.
It has its own exception handler, which prints out detailed information on caught errors.
dbSendQuery
, dbClearResult
, dbGetQuery
Given a list of authors' citation sequences, the function calculates values of many impact functions at a time.
lbsAssess( citseq, f = list(length, index_h), captions = c("length", "index_h"), orderByColumn = 2, bestRanks = 20, verbose = T )
lbsAssess( citseq, f = list(length, index_h), captions = c("length", "index_h"), orderByColumn = 2, bestRanks = 20, verbose = T )
citseq |
list of numeric vectors, e.g. the output of |
f |
a list of |
captions |
a list of |
orderByColumn |
column to sort the results on. |
bestRanks |
if not |
verbose |
logical; |
A data frame in which each row corresponds to the assessment
results of some citation sequence.
The first column stands for the authors' names (taken from names(citseq)
,
the second for the valuation of f[[1]]
, the third for f[[2]]
, and so on.
See Examples below.
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... citseq <- lbsGetCitations(conn, surveyDescription="Scientometrics", documentTypes="Article", idAuthors=c(39264,39265,39266)); print(citseq); ## $`Liu X.` # Author name ## 40116 34128 39122 29672 32343 32775 # IdDocument ## 11 4 1 0 0 0 # Citation count ## attr(,"IdAuthor") ## [1] 39264 # IdAuthor ## ## $`Xu Y.` ## 38680 38605 40035 40030 40124 39829 39745 29672 ## 30 14 8 6 6 5 3 0 ## attr(,"IdAuthor") ## [1] 39265 ## ## $`Wang Y.` ## 29992 29672 29777 32906 33858 33864 34704 ## 1 0 0 0 0 0 0 ## attr(,"IdAuthor") ## [1] 39266 library("agop") print(lbsAssess(citseq, f=list(length, sum, index.h, index.g, function(x) index.rp(x,1), function(x) sqrt(prod(index.lp(x,1))), function(x) sqrt(prod(index.lp(x,Inf)))), captions=c("length", "sum", "index.h", "index.g", "index.w", "index.lp1", "index.lpInf"))); ## Name length sum index.h index.g index.w index.lp1 index.lpInf ## 3 Xu Y. 8 72 5 8 7 8.573214 5.477226 ## 2 Wang Y. 7 1 1 1 1 1.000000 1.000000 ## 1 Liu X. 6 16 2 4 3 4.157609 3.316625 ## ... dbDisconnect(conn); ## End(Not run)
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... citseq <- lbsGetCitations(conn, surveyDescription="Scientometrics", documentTypes="Article", idAuthors=c(39264,39265,39266)); print(citseq); ## $`Liu X.` # Author name ## 40116 34128 39122 29672 32343 32775 # IdDocument ## 11 4 1 0 0 0 # Citation count ## attr(,"IdAuthor") ## [1] 39264 # IdAuthor ## ## $`Xu Y.` ## 38680 38605 40035 40030 40124 39829 39745 29672 ## 30 14 8 6 6 5 3 0 ## attr(,"IdAuthor") ## [1] 39265 ## ## $`Wang Y.` ## 29992 29672 29777 32906 33858 33864 34704 ## 1 0 0 0 0 0 0 ## attr(,"IdAuthor") ## [1] 39266 library("agop") print(lbsAssess(citseq, f=list(length, sum, index.h, index.g, function(x) index.rp(x,1), function(x) sqrt(prod(index.lp(x,1))), function(x) sqrt(prod(index.lp(x,Inf)))), captions=c("length", "sum", "index.h", "index.g", "index.w", "index.lp1", "index.lpInf"))); ## Name length sum index.h index.g index.w index.lp1 index.lpInf ## 3 Xu Y. 8 72 5 8 7 8.573214 5.477226 ## 2 Wang Y. 7 1 1 1 1 1.000000 1.000000 ## 1 Liu X. 6 16 2 4 3 4.157609 3.316625 ## ... dbDisconnect(conn); ## End(Not run)
Clears a Local Bibliometric Storage by dropping all tables
named Biblio_*
and all views named ViewBiblio_*
.
lbsClear(conn, verbose = TRUE)
lbsClear(conn, verbose = TRUE)
conn |
database connection object, see |
verbose |
logical; |
For safety reasons, an SQL transaction opened at the beginning of the removal process is not committed (closed) automatically. You should do manually (or rollback it), see Examples below.
TRUE
on success.
lbsConnect
, lbsCreate
,
Scopus_ImportSources
, lbsDeleteAllAuthorsDocuments
dbCommit
, dbRollback
## Not run: conn <- lbsConnect("Bibliometrics.db"); lbsClear(conn); dbCommit(conn); lbsCreate(conn); Scopus_ImportSources(conn); ## ... lbsDisconnect(conn); ## End(Not run)
## Not run: conn <- lbsConnect("Bibliometrics.db"); lbsClear(conn); dbCommit(conn); lbsCreate(conn); Scopus_ImportSources(conn); ## ... lbsDisconnect(conn); ## End(Not run)
Connects to a Local Bibliometric Storage handled by the SQLite engine (see RSQLite package documentation).
lbsConnect(dbfilename)
lbsConnect(dbfilename)
dbfilename |
filename of an SQLite database. |
Do not forget to close the connection (represented by the connection object returned)
with the lbsDisconnect
function after use.
Please note that the database may be also accessed by using
lower-level functions from the DBI package called on the
returned connection object. The table-view structure of a Local
Bibliometric Storage is presented in the man page of the
lbsCreate
function.
An object of type SQLiteConnection
, used to communicate with the SQLite engine.
## Not run: conn <- lbsConnect("Bibliometrics.db") ## ... lbsDisconnect(conn) ## End(Not run)
## Not run: conn <- lbsConnect("Bibliometrics.db") ## ... lbsDisconnect(conn) ## End(Not run)
Creates an empty Local Bibliometric Storage.
lbsCreate(conn, verbose = TRUE)
lbsCreate(conn, verbose = TRUE)
conn |
a connection object, see |
verbose |
logical; |
The function may be executed only if the database contains no tables
named Biblio_*
and no views named ViewBiblio_*
.
The following SQL code is executed.
CREATE TABLE Biblio_Categories (\cr -- Source classification codes (e.g. ASJC)\cr IdCategory INTEGER PRIMARY KEY ASC,\cr IdCategoryParent INTEGER NOT NULL,\cr Description VARCHAR(63) NOT NULL,\cr FOREIGN KEY(IdCategoryParent) REFERENCES Biblio_Categories(IdCategory)\cr );
CREATE TABLE Biblio_Sources ( IdSource INTEGER PRIMARY KEY AUTOINCREMENT, AlternativeId VARCHAR(31) UNIQUE NOT NULL, Title VARCHAR(255) NOT NULL, IsActive BOOLEAN, IsOpenAccess BOOLEAN, Type CHAR(2) CHECK (Type IN ('bs', 'cp', 'jo')), -- Book Series / Conference Proceedings / Journal -- or NULL in all other cases Impact1 REAL, -- value of an impact factor Impact2 REAL, -- value of an impact factor Impact3 REAL, -- value of an impact factor Impact4 REAL, -- value of an impact factor Impact5 REAL, -- value of an impact factor Impact6 REAL, -- value of an impact factor );
CREATE TABLE Biblio_SourcesCategories ( -- links Sources and Categories IdSource INTEGER NOT NULL, IdCategory INTEGER NOT NULL, PRIMARY KEY(IdSource, IdCategory), FOREIGN KEY(IdSource) REFERENCES Biblio_Sources(IdSource), FOREIGN KEY(IdCategory) REFERENCES Biblio_Categories(IdCategory) );
CREATE TABLE Biblio_Documents ( IdDocument INTEGER PRIMARY KEY AUTOINCREMENT, IdSource INTEGER, AlternativeId VARCHAR(31) UNIQUE NOT NULL, Title VARCHAR(255) NOT NULL, BibEntry TEXT, -- (e.g. Source Title,Year,Volume,Issue,Article Number,PageStart,PageEnd) Year INTEGER, Pages INTEGER, Citations INTEGER NOT NULL, Type CHAR(2) CHECK (Type IN ('ar', 'ip', 'bk', 'cp', 'ed', 'er', 'le', 'no', 'rp', 're', 'sh')), -- Article-ar / Article in Press-ip / Book-bk / -- Conference Paper-cp / Editorial-ed / Erratum-er / -- Letter-le/ Note-no / Report-rp / Review-re / Short Survey-sh -- or NULL in all other cases FOREIGN KEY(IdSource) REFERENCES Biblio_Sources(IdSource), FOREIGN KEY(IdLanguage) REFERENCES Biblio_Languages(IdLanguage) );
CREATE TABLE Biblio_Citations ( IdDocumentParent INTEGER NOT NULL, # cited document IdDocumentChild INTEGER NOT NULL, # reference PRIMARY KEY(IdDocumentParent, IdDocumentChild), FOREIGN KEY(IdDocumentParent) REFERENCES Biblio_Documents(IdDocument), FOREIGN KEY(IdDocumentChild) REFERENCES Biblio_Documents(IdDocument) );
CREATE TABLE Biblio_Surveys ( -- each call to lbsImportDocuments() puts a new record here, -- they may be grouped into so-called 'Surveys' using 'Description' field IdSurvey INTEGER PRIMARY KEY AUTOINCREMENT, Description VARCHAR(63) NOT NULL, -- survey group name FileName VARCHAR(63), -- original file name Timestamp DATETIME -- date of file import );
CREATE TABLE Biblio_DocumentsSurveys ( -- note that the one Document may often be found in many Surveys IdDocument INTEGER NOT NULL, IdSurvey INTEGER NOT NULL, PRIMARY KEY(IdDocument, IdSurvey), FOREIGN KEY(IdSurvey) REFERENCES Biblio_Surveys(IdSurvey), FOREIGN KEY(IdDocument) REFERENCES Biblio_Documents(IdDocument) );
CREATE TABLE Biblio_Authors ( IdAuthor INTEGER PRIMARY KEY AUTOINCREMENT, Name VARCHAR(63) NOT NULL, AuthorGroup VARCHAR(31), # used to merge authors with non-unique representations );
CREATE TABLE Biblio_AuthorsDocuments ( -- links Authors and Documents IdAuthor INTEGER NOT NULL, IdDocument INTEGER NOT NULL, PRIMARY KEY(IdAuthor, IdDocument), FOREIGN KEY(IdAuthor) REFERENCES Biblio_Authors(IdAuthor), FOREIGN KEY(IdDocument) REFERENCES Biblio_Documents(IdDocument) );
In addition, the following views are created.
CREATE VIEW ViewBiblio_DocumentsSurveys AS SELECT Biblio_DocumentsSurveys.IdDocument AS IdDocument, Biblio_DocumentsSurveys.IdSurvey AS IdSurvey, Biblio_Surveys.Description AS Description, Biblio_Surveys.Filename AS Filename, Biblio_Surveys.Timestamp AS Timestamp FROM Biblio_DocumentsSurveys JOIN Biblio_Surveys ON Biblio_DocumentsSurveys.IdSurvey=Biblio_Surveys.IdSurvey;
CREATE VIEW ViewBiblio_DocumentsCategories AS SELECT IdDocument AS IdDocument, DocSrcCat.IdCategory AS IdCategory, DocSrcCat.Description AS Description, DocSrcCat.IdCategoryParent AS IdCategoryParent, Biblio_Categories.Description AS DescriptionParent FROM ( SELECT Biblio_Documents.IdDocument AS IdDocument, Biblio_SourcesCategories.IdCategory AS IdCategory, Biblio_Categories.Description AS Description, Biblio_Categories.IdCategoryParent AS IdCategoryParent FROM Biblio_Documents JOIN Biblio_SourcesCategories ON Biblio_Documents.IdSource=Biblio_SourcesCategories.IdSource JOIN Biblio_Categories ON Biblio_SourcesCategories.IdCategory=Biblio_Categories.IdCategory ) AS DocSrcCat JOIN Biblio_Categories ON DocSrcCat.IdCategoryParent=Biblio_Categories.IdCategory;
TRUE
on success.
lbsConnect
,
lbsClear
,
Scopus_ImportSources
,
lbsTidy
/internal/
/internal/
/internal/
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... lbsCreate(conn); Scopus_ImportSources(conn); ## ... lbsDisconnect(conn); ## End(Not run)
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... lbsCreate(conn); Scopus_ImportSources(conn); ## ... lbsDisconnect(conn); ## End(Not run)
Deletes author, citation, document, and survey information from a Local Bibliometric Storage.
lbsDeleteAllAuthorsDocuments(conn, verbose = TRUE)
lbsDeleteAllAuthorsDocuments(conn, verbose = TRUE)
conn |
database connection object, see |
verbose |
logical; |
For safety reasons, an SQL transaction opened at the beginning of the removal process is not committed (closed) automatically. You should do manually (or rollback it), see Examples below.
TRUE
on success.
lbsClear
,
dbCommit
,
dbRollback
## Not run: conn <- lbsConnect("Bibliometrics.db") lbsDeleteAllAuthorsDocuments(conn) dbCommit(conn) ## ... lbsDisconnect(conn) ## End(Not run)
## Not run: conn <- lbsConnect("Bibliometrics.db") lbsDeleteAllAuthorsDocuments(conn) dbCommit(conn) ## ... lbsDisconnect(conn) ## End(Not run)
Performs preliminary analysis of data in a Local Bibliometric Storage by creating some basic descriptive statistics (numeric and graphical). Dataset may be restricted to any given document types or a single survey.
lbsDescriptiveStats( conn, documentTypes = NULL, surveyDescription = NULL, which = (1L:7L), main = "", ask = (prod(par("mfcol")) < length(which) && dev.interactive()), ..., cex.caption = 1 )
lbsDescriptiveStats( conn, documentTypes = NULL, surveyDescription = NULL, which = (1L:7L), main = "", ask = (prod(par("mfcol")) < length(which) && dev.interactive()), ..., cex.caption = 1 )
conn |
connection object, see |
documentTypes |
character vector or |
surveyDescription |
single character string or |
which |
numeric vector with elements in 1,...,7, or |
main |
title for each plot. |
ask |
logical; if |
... |
additional graphical parameters, see |
cex.caption |
controls size of default captions. |
Plot types (accessed with which
):
1
— "Document types",
2
— "Publication years",
3
— "Citations per document",
4
— "Citations of cited documents per type",
5
— "Number of pages per document type",
6
— "Categories of documents" (based od source categories),
7
— "Documents per author".
Note that this user interaction scheme is similar in behavior
to the plot.lm
function.
plot.default
,
lbsConnect
/internal/
/internal/
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... lbsDescriptiveStats(conn, surveyDescription="Scientometrics", documentTypes=c("Article", "Note", "Report", "Review", "Short Survey")); ## ... lbsDisconnect(conn); ## End(Not run)
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... lbsDescriptiveStats(conn, surveyDescription="Scientometrics", documentTypes=c("Article", "Note", "Report", "Review", "Short Survey")); ## ... lbsDisconnect(conn); ## End(Not run)
Disconnects from a Local Bibliometric Storage.
lbsDisconnect(conn)
lbsDisconnect(conn)
conn |
database connection object, see |
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... lbsDisconnect(conn); ## End(Not run)
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... lbsDisconnect(conn); ## End(Not run)
Creates ordered citation sequences of authors in a Local Bibliometric Storage.
lbsGetCitations( conn, documentTypes = NULL, surveyDescription = NULL, idAuthors = NULL, verbose = TRUE )
lbsGetCitations( conn, documentTypes = NULL, surveyDescription = NULL, idAuthors = NULL, verbose = TRUE )
conn |
a connection object as produced by |
documentTypes |
character vector or |
surveyDescription |
single character string or |
idAuthors |
numeric vector of authors' identifiers for which the sequences are to be created or |
verbose |
logical; |
A citation sequence is a numeric vector
consisting of citation counts of all the documents mapped to
selected authors.
However, the function may take into account only the documents
from a given Survey (using surveyDescription
parameter) or of chosen types (documentTypes
).
A list of non-increasingly ordered numeric vectors is returned. Each element of the list corresponds
to a citation sequence of some author. List names
attribute are
set to authors' names. Moreover, each vector has a set IdAuthor
attribute, which uniquely identifies the corresponding record in the table Biblio_Authors
.
Citation counts come together with IdDocument
s (vector elements are named).
The list of citation sequences may then be used to calculate
authors' impact using lbsAssess
(see Examples below).
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... citseq <- lbsGetCitations(conn, surveyDescription="Scientometrics", documentTypes="Article", idAuthors=c(39264,39265,39266)); print(citseq); ## $`Liu X.` # Author name ## 40116 34128 39122 29672 32343 32775 # IdDocument ## 11 4 1 0 0 0 # Citation count ## attr(,"IdAuthor") ## [1] 39264 # IdAuthor ## ## $`Xu Y.` ## 38680 38605 40035 40030 40124 39829 39745 29672 ## 30 14 8 6 6 5 3 0 ## attr(,"IdAuthor") ## [1] 39265 ## ## $`Wang Y.` ## 29992 29672 29777 32906 33858 33864 34704 ## 1 0 0 0 0 0 0 ## attr(,"IdAuthor") ## [1] 39266 print(lbsAssess(citseq, f=list(length, sum, index.h, index.g, function(x) index.rp(x,1), function(x) sqrt(prod(index.lp(x,1))), function(x) sqrt(prod(index.lp(x,Inf)))), captions=c("length", "sum", "index.h", "index.g", "index.w", "index.lp1", "index.lpInf"))); ## Name length sum index.h index.g index.w index.lp1 index.lpInf ## 3 Xu Y. 8 72 5 8 7 8.573214 5.477226 ## 2 Wang Y. 7 1 1 1 1 1.000000 1.000000 ## 1 Liu X. 6 16 2 4 3 4.157609 3.316625 ## ... dbDisconnect(conn); ## End(Not run)
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... citseq <- lbsGetCitations(conn, surveyDescription="Scientometrics", documentTypes="Article", idAuthors=c(39264,39265,39266)); print(citseq); ## $`Liu X.` # Author name ## 40116 34128 39122 29672 32343 32775 # IdDocument ## 11 4 1 0 0 0 # Citation count ## attr(,"IdAuthor") ## [1] 39264 # IdAuthor ## ## $`Xu Y.` ## 38680 38605 40035 40030 40124 39829 39745 29672 ## 30 14 8 6 6 5 3 0 ## attr(,"IdAuthor") ## [1] 39265 ## ## $`Wang Y.` ## 29992 29672 29777 32906 33858 33864 34704 ## 1 0 0 0 0 0 0 ## attr(,"IdAuthor") ## [1] 39266 print(lbsAssess(citseq, f=list(length, sum, index.h, index.g, function(x) index.rp(x,1), function(x) sqrt(prod(index.lp(x,1))), function(x) sqrt(prod(index.lp(x,Inf)))), captions=c("length", "sum", "index.h", "index.g", "index.w", "index.lp1", "index.lpInf"))); ## Name length sum index.h index.g index.w index.lp1 index.lpInf ## 3 Xu Y. 8 72 5 8 7 8.573214 5.477226 ## 2 Wang Y. 7 1 1 1 1 1.000000 1.000000 ## 1 Liu X. 6 16 2 4 3 4.157609 3.316625 ## ... dbDisconnect(conn); ## End(Not run)
Retrieves basic information on given authors.
lbsGetInfoAuthors(conn, idAuthors)
lbsGetInfoAuthors(conn, idAuthors)
conn |
a connection object as produced by |
idAuthors |
a numeric or integer vector with author identifiers (see column |
A list of authorinfo
objects, that is lists with the following components:
IdAuthor
— numeric; author's identifier in the table Biblio_Authors
,
Name
— character; author's name.
AuthorGroup
— character; author group (used to merge author records).
lbsSearchAuthors
, lbsSearchDocuments
,
lbsGetInfoDocuments
,as.character.authorinfo
, print.authorinfo
,
## Not run: conn <- dbBiblioConnect("Bibliometrics.db"); ## ... id <- lbsSearchAuthors(conn, c("Smith\ lbsGetInfoAuthors(conn, id); ## ... ## End(Not run)
## Not run: conn <- dbBiblioConnect("Bibliometrics.db"); ## ... id <- lbsSearchAuthors(conn, c("Smith\ lbsGetInfoAuthors(conn, id); ## ... ## End(Not run)
Retrieves information on given documents.
lbsGetInfoDocuments(conn, idDocuments)
lbsGetInfoDocuments(conn, idDocuments)
conn |
a connection object as produced by |
idDocuments |
a numeric or integer vector with document identifiers (see column |
A list of docinfo
objects, that is lists with the following components:
IdDocument
— numeric; document identifier in the table Biblio_Documents
,
Authors
— list of authorinfo
objects (see e.g. as.character.authorinfo
).
Title
— title of the document,
BibEntry
— bibliographic entry,
AlternativeId
— unique character identifier,
Pages
— number of pages,
Citations
— number of citations,
Year
— publication year,
Type
— document type, e.g. Article
or Conference Paper
.
print.docinfo
, lbsSearchDocuments
,
lbsGetInfoAuthors
,as.character.authorinfo
, as.character.docinfo
## Not run: conn <- dbBiblioConnect("Bibliometrics.db"); ## ... id <- lbsSearchDocuments(conn, idAuthors=lbsSearchAuthors(conn, "Knuth\ lbsGetInfoDocuments(conn, id); ## ... ## End(Not run)
## Not run: conn <- dbBiblioConnect("Bibliometrics.db"); ## ... id <- lbsSearchDocuments(conn, idAuthors=lbsSearchAuthors(conn, "Knuth\ lbsGetInfoDocuments(conn, id); ## ... ## End(Not run)
Imports bibliographic data from a special 11-column data.frame
object
(see e.g. Scopus_ReadCSV
) into a Local Bibliometric Storage.
lbsImportDocuments( conn, data, surveyDescription = "Default survey", surnameFirstnameCommaSeparated = FALSE, originalFilename = attr(data, "filename"), excludeRows = NULL, updateDocumentIfExists = TRUE, warnSourceTitle = TRUE, warnExactDuplicates = FALSE, verbose = TRUE )
lbsImportDocuments( conn, data, surveyDescription = "Default survey", surnameFirstnameCommaSeparated = FALSE, originalFilename = attr(data, "filename"), excludeRows = NULL, updateDocumentIfExists = TRUE, warnSourceTitle = TRUE, warnExactDuplicates = FALSE, verbose = TRUE )
conn |
a connection object, see |
data |
11 column |
surveyDescription |
description of the survey. Allows for documents grouping. |
surnameFirstnameCommaSeparated |
logical; indicates wher surnames are separated from first names (or initials) by comma or by space ( |
originalFilename |
original filename; |
excludeRows |
a numeric vector with row numbers of |
updateDocumentIfExists |
logical; if |
warnSourceTitle |
logical; if |
warnExactDuplicates |
logical; |
verbose |
logical; |
data
must consist of the following 11 columns (in order). Otherwise
the process will not be executed.
1 | Authors |
character | Author(s) name(s), comma-separated, surnames first. |
2 | Title |
character | Document title. |
3 | Year |
numeric | Year of publication. |
4 | SourceTitle |
character | Title of the source containing the document. |
5 | Volume |
character | Volume. |
6 | Issue |
character | Issue. |
7 | PageStart |
numeric | Start page; numeric. |
8 | PageEnd |
numeric | End page; numeric. |
9 | Citations |
numeric | Number of citations; numeric. |
10 | AlternativeId |
character | Alternative document identifier. |
11 | DocumentType |
factor | Type of the document. |
DocumentType
is one of “Article”, “Article in Press”,
“Book”, “Conference Paper”, “Editorial”, “Erratum”,
“Letter”, “Note”, “Report”,
“Review”, “Short Survey”, or NA
(other categories are interpreted as NA
).
Note that if data
contains a large number of records (>1000),
the whole process may take a few minutes.
Sources (e.g. journals) are identified by SourceTitle (table Biblio_Sources
).
Note that generally there is no need to concern about missing SourceTitles of
conference proceedings.
Each time a function is called, a new record in the table Biblio_Surveys
is created. Such surveys may be grouped using the Description
field, see lbsCreate
.
TRUE
on success.
Scopus_ReadCSV
, lbsConnect
, lbsCreate
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... data <- Scopus_ReadCSV("db_Polish_MATH/Poland_MATH_1987-1993.csv"); lbsImportDocuments(conn, data, "Poland_MATH"); ## ... lbsDisconnect(conn); ## End(Not run)
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... data <- Scopus_ReadCSV("db_Polish_MATH/Poland_MATH_1987-1993.csv"); lbsImportDocuments(conn, data, "Poland_MATH"); ## ... lbsDisconnect(conn); ## End(Not run)
Finds authors by name.
lbsSearchAuthors(conn, names.like = NULL, group = NULL)
lbsSearchAuthors(conn, names.like = NULL, group = NULL)
conn |
connection object, see |
names.like |
character vector of SQL-LIKE patterns to match authors' names. |
group |
character vector of author group identifiers. |
names.like
is a set of search patterns in an SQL LIKE
format,
i.e. an underscore _
matches a single character and a percent sign
%
matches any set of characters. The search is case-insensitive.
Integer vector of authors' identifiers which match at least one of given SQL-LIKE patterns.
lbsGetInfoAuthors
,
lbsSearchDocuments
,
lbsGetInfoDocuments
## Not run: conn <- dbBiblioConnect("Bibliometrics.db"); ## ... id <- lbsSearchAuthors(conn, c("Smith\ lbsGetInfoAuthors(conn, id); ## ... ## End(Not run)
## Not run: conn <- dbBiblioConnect("Bibliometrics.db"); ## ... id <- lbsSearchAuthors(conn, c("Smith\ lbsGetInfoAuthors(conn, id); ## ... ## End(Not run)
Searches for documents meeting given criteria (e.g. document titles, documents' authors identifiers, number of citations, number of pages, publication years or document types).
lbsSearchDocuments( conn, titles.like = NULL, idAuthors = NULL, citations.expr = NULL, pages.expr = NULL, year.expr = NULL, documentTypes = NULL, alternativeId = NULL, surveyDescription = NULL )
lbsSearchDocuments( conn, titles.like = NULL, idAuthors = NULL, citations.expr = NULL, pages.expr = NULL, year.expr = NULL, documentTypes = NULL, alternativeId = NULL, surveyDescription = NULL )
conn |
connection object, see |
titles.like |
character vector of SQL-LIKE patterns to match documents' titles or |
idAuthors |
numeric or integer vector with author identifiers (see column |
citations.expr |
expression determining the desired number of citations or |
pages.expr |
expression determining the desired number of pages or |
year.expr |
expression determining the desired publication year or |
documentTypes |
character vector or |
alternativeId |
character vector of documents' AlternativeIds. |
surveyDescription |
single character string or |
titles.like
is a set of search patterns in an SQL LIKE
format,
i.e. an underscore _
matches a single character and a percent sign
%
matches any set of characters. The search is case-insensitive.
The expressions passed as
parameters citations.expr
, pages.expr
, year.expr
must be acceptable by SQL WHERE clause in the form
WHERE field <expression>
, see Examples below.
Integer vector of documents' identifiers matching given criteria.
lbsGetInfoAuthors
,
lbsSearchAuthors
,
lbsGetInfoDocuments
## Not run: conn <- dbBiblioConnect("Bibliometrics.db"); ## ... idd <- lbsSearchDocuments(conn, pages.expr=">= 400", year.expr="BETWEEN 1970 AND 1972"); lbsGetInfoDocuments(conn, idd); ## ... ## End(Not run)
## Not run: conn <- dbBiblioConnect("Bibliometrics.db"); ## ... idd <- lbsSearchDocuments(conn, pages.expr=">= 400", year.expr="BETWEEN 1970 AND 1972"); lbsGetInfoDocuments(conn, idd); ## ... ## End(Not run)
Cleans up a Local Bibliometric Storage
by removing all authors with no documents, fixing documents
with missing survey information, and executing the VACUUM
SQL command.
lbsTidy( conn, newSuveyDescription = "lbsTidy_Merged", newSuveyFilename = "lbsTidy_Merged" )
lbsTidy( conn, newSuveyDescription = "lbsTidy_Merged", newSuveyFilename = "lbsTidy_Merged" )
conn |
database connection object, see |
newSuveyDescription |
character; default survey description for documents with missing survey info. |
newSuveyFilename |
character; default survey filename for documents with missing survey info. |
TRUE
on success.
lbsConnect
, lbsCreate
,
Scopus_ImportSources
,
lbsDeleteAllAuthorsDocuments
,
dbCommit
, dbRollback
Prints out an object of class authorinfo
. Such an object is returned by e.g. lbsGetInfoAuthors
.
## S3 method for class 'authorinfo' print(x, ...)
## S3 method for class 'authorinfo' print(x, ...)
x |
an object of class |
... |
unused. |
For more information see man page for as.character.authorinfo
.
as.character.authorinfo
, lbsSearchAuthors
, lbsGetInfoAuthors
Prints out an object of class docinfo
. Such an object is returned by e.g. lbsGetInfoDocuments
.
## S3 method for class 'docinfo' print(x, ...)
## S3 method for class 'docinfo' print(x, ...)
x |
an object of class |
... |
unused. |
For more information see man page for as.character.docinfo
.
as.character.docinfo
, lbsSearchDocuments
, lbsGetInfoDocuments
List of Elsevier's SciVerse Scopus ASJC (All Science. Journals Classification) source classification codes.
Scopus_ASJC
Scopus_ASJC
An object of class NULL
of length 0.
Last update: October 2011. The data file is based on the official and publicly available (no permission needed as stated by Elsevier) Scopus list of covered titles.
It consists of 334 ASJC 4-digit integer codes (column ASJC
)
together with their group identifiers (column ASJC_Parent
)
and descriptions (column Description
).
ASJC codes are used to classify Scopus sources (see Scopus_SourceList
).
Scopus_SourceList
, Scopus_ReadCSV
, Scopus_ImportSources
Imports SciVerse Scopus covered titles and their ASJC codes to an empty Local Bibliometric Storage (LBS).
Scopus_ImportSources(conn, verbose = T)
Scopus_ImportSources(conn, verbose = T)
conn |
a connection object, see |
verbose |
logical; |
This function should be called prior to importing any document information
to the LBS with the function lbsImportDocuments
.
Note that adding all the sources takes some time.
Only elementary ASJC and SciVerse Scopus source data
read from Scopus_ASJC
and Scopus_SourceList
will be added to the LBS (Biblio_Categories
, Biblio_Sources
, Biblio_SourcesCategories
).
TRUE
on success.
Scopus_ASJC
, Scopus_SourceList
, Scopus_ReadCSV
, lbsConnect
, lbsCreate
## Not run: conn <- lbsConnect("Bibliometrics.db"); lbsCreate(conn); Scopus_ImportSources(conn); ## ... lbsDisconnect(conn); ## End(Not run)
## Not run: conn <- lbsConnect("Bibliometrics.db"); lbsCreate(conn); Scopus_ImportSources(conn); ## ... lbsDisconnect(conn); ## End(Not run)
Reads bibliography entries from a UTF-8 encoded CSV file.
Scopus_ReadCSV( filename, stopOnErrors = TRUE, dbIdentifier = "Scopus", alternativeIdPattern = "^.*\\id=|\\&.*$", ... )
Scopus_ReadCSV( filename, stopOnErrors = TRUE, dbIdentifier = "Scopus", alternativeIdPattern = "^.*\\id=|\\&.*$", ... )
filename |
the name of the file which the data are to be read from, see |
stopOnErrors |
logical; |
dbIdentifier |
character or |
alternativeIdPattern |
character; regular expression used to extract AlternativeId, |
... |
further arguments to be passed to |
The read.csv
function is used to read the bibliography.
You may therefore freely modify its behavior
by passing further arguments (...
), see the manual page
of read.table
for details.
The CSV file should consist at least of the following columns.
Authors
: Author name(s) (surname first; multiple names are comma-separated,
e.g. “Smith John, Nowak G. W.”),
Title
: Document title,
Year
: Year of publication,
Source.title
: Source title, e.g. journal name,
Volume
: Volume number,
Issue
: Issue number,
Page.start
: Start page number,
Page.end
: End page number,
Cited.by
: Number of citations received,
Link
: String containing unique document identifier, by default of the form ...id=UNIQUE_ID&... (see alternativeIdPattern
parameter),
Document.Type
: Document type, one of: “Article”, “Article in Press”,
“Book”, “Conference Paper”, “Editorial”,
“Erratum”, “Letter”, “Note”, “Report”,
“Review”, “Short Survey”, or NA
(other categories are treated as NA
s),
Source
: Data source identifier, must be the same as the
dbIdentifier
parameter value. It is used for parse errors detection.
The CSV file to be read may, for example, be created by SciVerse Scopus (Export format=comma separated file, .csv (e.g. Excel), Output=Complete format or Citations only). Note that the exported CSV file sometimes needs to be corrected by hand (wrong page numbers, single double quotes in character strings instead of two-double quotes etc.). We suggest to make the corrections in a “Notepad”-like application (in plain text). The function tries to indicate line numbers causing potential problems.
A data.frame
containing the following 11 columns:
Authors |
Author name(s), comma-separated, surnames first. |
Title |
Document title. |
Year |
Year of publication. |
AlternativeId |
Unique document identifier. |
SourceTitle |
Title of the source containing the document. |
Volume |
Volume. |
Issue |
Issue. |
PageStart |
Start page; numeric. |
PageEnd |
End page; numeric. |
Citations |
Number of citations; numeric. |
DocumentType |
Type of the document; see above. |
The object returned may be imported into a local bibliometric storage via lbsImportDocuments
.
Scopus_ASJC
, Scopus_SourceList
,
lbsConnect
,
Scopus_ImportSources
,read.table
, lbsImportDocuments
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... data <- Scopus_ReadCSV("db_Polish_MATH/Poland_MATH_1987-1993.csv"); lbsImportDocuments(conn, data, "Poland_MATH"); ## ... lbsDisconnect(conn); ## End(Not run)
## Not run: conn <- lbsConnect("Bibliometrics.db"); ## ... data <- Scopus_ReadCSV("db_Polish_MATH/Poland_MATH_1987-1993.csv"); lbsImportDocuments(conn, data, "Poland_MATH"); ## ... lbsDisconnect(conn); ## End(Not run)
List of Elsevier's SciVerse Scopus covered titles (journals, conference proceedings, book series, etc.)
Scopus_SourceList
Scopus_SourceList
An object of class NULL
of length 0.
Last update: October 2011. The data file is based on the official and publicly available (no permission needed as stated by Elsevier) Scopus list of covered titles.
This data frame consists of 30794 records. It has the following columns.
SourceId |
Unique source identifier in SciVerse Scopus (integer). |
Title |
Title of the source. |
Status |
Status of the source, either Active or Inactive . |
SJR_2009 |
SCImago Journal Rank 2009. |
SNIP_2009 |
Source Normalized Impact per Paper 2009. |
SJR_2010 |
SCImago Journal Rank 2010. |
SNIP_2010 |
Source Normalized Impact per Paper 2010. |
SJR_2011 |
SCImago Journal Rank 2011. |
SNIP_2011 |
Source Normalized Impact per Paper 2011. |
OpenAccess |
Type of Open Access, see below. |
Type |
Type of the source, see below. |
ASJC |
A list of semicolon-separated ASJC classification codes, see Scopus_ASJC . |
OpenAccess
is one of DOAJ
, Not OA
(not Open Access source),
OA but not registered
, OA registered
.
Type
is one of Book Series
, Conference Proceedings
, Journal
, Trade Journal
The data.frame
is sorted by Status
(Active
sources first) and then by SJR_2011
(higher values first).
Scopus_ASJC
, Scopus_ReadCSV
, Scopus_ImportSources