stringi - Fast and Portable Character String Processing Facilities
A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).
Last updated 6 months ago
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringistringrtexttext-processingtidy-dataunicodecpp
17.90 score 307 stars 8.4k dependents 10k scripts 600k downloadsFuzzyNumbers - Tools to Deal with Fuzzy Numbers
S4 classes and methods to deal with fuzzy numbers. They allow for computing any arithmetic operations (e.g., by using the Zadeh extension principle), performing approximation of arbitrary fuzzy numbers by trapezoidal and piecewise linear ones, preparing plots for publications, computing possibility and necessity values for comparisons, etc.
Last updated 3 years ago
7.37 score 10 stars 17 dependents 91 scripts 509 downloadsTurtleGraphics - Turtle Graphics
An implementation of turtle graphics <http://en.wikipedia.org/wiki/Turtle_graphics>. Turtle graphics comes from Papert's language Logo and has been used to teach concepts of computer programming.
Last updated 2 years ago
7.21 score 23 stars 2 dependents 117 scripts 213 downloadsgenieclust - Fast and Robust Hierarchical Clustering with Noise Points Detection
A retake on the Genie algorithm (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>) - a robust hierarchical clustering method (Gagolewski, Bartoszuk, Cena, 2016 <DOI:10.1016/j.ins.2016.05.003>). Now faster and more memory efficient; determining the whole hierarchy for datasets of 10M points in low dimensional Euclidean spaces or 100K points in high-dimensional ones takes only 1-2 minutes. Allows clustering with respect to mutual reachability distances so that it can act as a noise point detector or a robustified version of 'HDBSCAN*' (that is able to detect a predefined number of clusters and hence it does not dependent on the somewhat fragile 'eps' parameter). The package also features an implementation of inequality indices (the Gini, Bonferroni index), external cluster validity measures (e.g., the normalised clustering accuracy and partition similarity scores such as the adjusted Rand, Fowlkes-Mallows, adjusted mutual information, and the pair sets index), and internal cluster validity indices (e.g., the Calinski-Harabasz, Davies-Bouldin, Ball-Hall, Silhouette, and generalised Dunn indices). See also the 'Python' version of 'genieclust' available on 'PyPI', which supports sparse data, more metrics, and even larger datasets.
Last updated 2 months ago
cluster-analysisclusteringclustering-algorithmdata-analysisdata-miningdata-sciencegeniehdbscanhierarchical-clusteringhierarchical-clustering-algorithmmachine-learningmachine-learning-algorithmsmlpacknmslibpythonpython3sparsecppopenmp
6.80 score 59 stars 5 dependents 12 scripts 585 downloadsagop - Aggregation Operators and Preordered Sets
Tools supporting multi-criteria and group decision making, including variable number of criteria, by means of aggregation operators, spread measures, fuzzy logic connectives, fusion functions, and preordered sets. Possible applications include, but are not limited to, quality management, scientometrics, software engineering, etc.
Last updated 1 years ago
aggregationcpp
5.06 score 5 stars 2 dependents 77 scripts 307 downloadsstringx - Replacements for Base String Functions Powered by 'stringi'
English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to 'ICU' (International Components for Unicode) and 'stringi', they are fast, reliable, and portable across different platforms.
Last updated 9 days ago
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringitexttext-processingunicode
4.75 score 28 stars 2 scripts 294 downloadsgenie - Fast, Robust, and Outlier Resistant Hierarchical Clustering
Includes the reference implementation of Genie - a hierarchical clustering algorithm that links two point groups in such a way that an inequity measure (namely, the Gini index) of the cluster sizes does not significantly increase above a given threshold. This method most often outperforms many other data segmentation approaches in terms of clustering quality as tested on a wide range of benchmark datasets. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. For more details see (Gagolewski et al. 2016 <DOI:10.1016/j.ins.2016.05.003>). For an even faster and more feature-rich implementation, including, amongst others, noise point detection, see the 'genieclust' package (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>).
Last updated 2 years ago
clustercluster-analysisclusteringdata-analysisdata-miningdata-sciencedatasciencegeniehierarchical-clustering-algorithmmachine-learningmachine-learning-algorithmsoutlierscppopenmp
4.55 score 22 stars 16 scripts 227 downloadsCITAN - CITation ANalysis Toolpack
Supports quantitative research in scientometrics and bibliometrics. Provides various tools for preprocessing bibliographic data retrieved, e.g., from Elsevier's SciVerse Scopus, computing bibliometric impact of individuals, or modelling phenomena encountered in the social sciences. This package is deprecated, see 'agop' instead.
Last updated 3 years ago
3.82 score 6 stars 22 scripts 290 downloadsrealtest - Where Expectations Meet Reality: Realistic Unit Testing
A framework for unit testing for realistic minimalists, where we distinguish between expected, acceptable, current, fallback, ideal, or regressive behaviour. It can also be used for monitoring third-party software projects for changes.
Last updated 6 months ago
continuous-testingtesting-toolsunit-testing
3.74 score 11 stars 254 downloads