R packages by gagolews

stringi - Fast and Portable Character String Processing Facilities

A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).

Last updated 1 months ago

icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringistringrtexttext-processingtidy-dataunicodecpp

18.31 score 309 stars 8.6k dependents 10k scripts 888k downloads

FuzzyNumbers - Tools to Deal with Fuzzy Numbers

S4 classes and methods to deal with fuzzy numbers. They allow for computing any arithmetic operations (e.g., by using the Zadeh extension principle), performing approximation of arbitrary fuzzy numbers by trapezoidal and piecewise linear ones, preparing plots for publications, computing possibility and necessity values for comparisons, etc.

Last updated 3 years ago

7.37 score 10 stars 17 dependents 91 scripts 794 downloads

genieclust - Fast and Robust Hierarchical Clustering with Noise Points Detection

A retake on the Genie algorithm (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>), which is a robust hierarchical clustering method (Gagolewski, Bartoszuk, Cena, 2016 <DOI:10.1016/j.ins.2016.05.003>). It is now faster and more memory efficient; determining the whole cluster hierarchy for datasets of 10M points in low dimensional Euclidean spaces or 100K points in high-dimensional ones takes only a minute or so. Allows clustering with respect to mutual reachability distances so that it can act as a noise point detector or a robustified version of 'HDBSCAN*' (that is able to detect a predefined number of clusters and hence it does not dependent on the somewhat fragile 'eps' parameter). The package also features an implementation of inequality indices (e.g., Gini and Bonferroni), external cluster validity measures (e.g., the normalised clustering accuracy, the adjusted Rand index, the Fowlkes-Mallows index, and normalised mutual information), and internal cluster validity indices (e.g., the Calinski-Harabasz, Davies-Bouldin, Ball-Hall, Silhouette, and generalised Dunn indices). See also the 'Python' version of 'genieclust' available on 'PyPI', which supports sparse data, more metrics, and even larger datasets.

Last updated 14 hours ago

cluster-analysisclusteringclustering-algorithmdata-analysisdata-miningdata-sciencegeniehdbscanhierarchical-clusteringhierarchical-clustering-algorithmmachine-learningmachine-learning-algorithmsmlpacknmslibpythonpython3sparsecppopenmp

7.29 score 61 stars 5 dependents 13 scripts 820 downloads

TurtleGraphics - Turtle Graphics

An implementation of turtle graphics <http://en.wikipedia.org/wiki/Turtle_graphics>. Turtle graphics comes from Papert's language Logo and has been used to teach concepts of computer programming.

Last updated 3 years ago

7.21 score 23 stars 2 dependents 117 scripts 342 downloads

agop - Aggregation Operators and Preordered Sets

Tools supporting multi-criteria and group decision making, including variable number of criteria, by means of aggregation operators, spread measures, fuzzy logic connectives, fusion functions, and preordered sets. Possible applications include, but are not limited to, quality management, scientometrics, software engineering, etc.

Last updated 1 years ago

aggregationcpp

5.06 score 5 stars 2 dependents 77 scripts 436 downloads

stringx - Replacements for Base String Functions Powered by 'stringi'

English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to 'ICU' (International Components for Unicode) and 'stringi', they are fast, reliable, and portable across different platforms.

Last updated 2 months ago

icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringitexttext-processingunicode

4.75 score 28 stars 1 scripts 507 downloads

genie - Fast, Robust, and Outlier Resistant Hierarchical Clustering

Includes the reference implementation of Genie - a hierarchical clustering algorithm that links two point groups in such a way that an inequity measure (namely, the Gini index) of the cluster sizes does not significantly increase above a given threshold. This method most often outperforms many other data segmentation approaches in terms of clustering quality as tested on a wide range of benchmark datasets. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. For more details see (Gagolewski et al. 2016 <DOI:10.1016/j.ins.2016.05.003>). For an even faster and more feature-rich implementation, including, amongst others, noise point detection, see the 'genieclust' package (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>).

Last updated 3 years ago

clustercluster-analysisclusteringdata-analysisdata-miningdata-sciencedatasciencegeniehierarchical-clustering-algorithmmachine-learningmachine-learning-algorithmsoutlierscppopenmp

4.55 score 22 stars 16 scripts 497 downloads

CITAN - CITation ANalysis Toolpack

Supports quantitative research in scientometrics and bibliometrics. Provides various tools for preprocessing bibliographic data retrieved, e.g., from Elsevier's SciVerse Scopus, computing bibliometric impact of individuals, or modelling phenomena encountered in the social sciences. This package is deprecated, see 'agop' instead.

Last updated 3 years ago

3.82 score 6 stars 22 scripts 314 downloads

realtest - Where Expectations Meet Reality: Realistic Unit Testing

A framework for unit testing for realistic minimalists, where we distinguish between expected, acceptable, current, fallback, ideal, or regressive behaviour. It can also be used for monitoring third-party software projects for changes.

Last updated 8 months ago

continuous-testingtesting-toolsunit-testing

3.74 score 11 stars 302 downloads