A PHP implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm for extracting relevant keywords from individual documents.
You can install the package via composer:
composer require kudashevs/rake-php
Here is a common usage example:
use Kudashevs\RakePhp\Rake;
$text = "Compatibility of systems of linear constraints over the set of natural numbers.";
$text .= "Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are considered.";
$text .= "Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given.";
$text .= "These criteria and the corresponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types of systems and systems of mixed types";
$rake = new Rake();
$keywords = $rake->extract($text);
print_r($keywords);
// will result in
Array
(
[minimal generating sets] => 8.6666666666667
[linear diophantine equations] => 8.5
[minimal supporting set] => 7.6666666666667
[minimal set] => 4.6666666666667
[linear constraints] => 4.5
[natural numbers] => 4
[strict inequations] => 4
[nonstrict inequations] => 4
[upper bounds] => 4
[mixed types] => 3.6666666666667
[considered types] => 3.1666666666667
[set] => 2
[types] => 1.6666666666667
[considered] => 1.5
[compatibility] => 1
[systems] => 1
[criteria] => 1
[system] => 1
[components] => 1
[solutions] => 1
[algorithms] => 1
[construction] => 1
[constructing] => 1
[solving] => 1
)
More information about RAKE and its usage, you can find in the original paper.
The Rake
class accepts some configuration options:
'modifiers' => [] # A string, an instance or an array of Modifiers
'stoplist' => Stoplist::class # A Stoplist instance that provides a list of stop words
'sorter' => Sorter::class # A Sorter instance that sorts the output of the algorithm
'exclude' => [] # An array of words or regex that will be excluded from a stoplist
'include' => [] # An array of words or regexes that will be included in a stoplist
Note: the configuration options exclude
and include
accept simple regexes.
Note: the configuration option exclude
has a higher priority than the include
option.
Note: At the moment of instantiation, the Rake
class can throw an InvalidOptionType
exception. This exception
extends a built-in InvalidArgumentException
class, so it is easy to deal with.
The configuration options exclude
and include
accept regular expressions. The current expressions are currently supported:
.+(ly)
- a one or more match with groupingword(s)
- a match with alternation at the end of a word(word|letter)
- an alternation of words
composer test
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Note: Please make sure to update tests as appropriate.
The MIT License (MIT). Please see the License file for more information.