Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
davidmezzetti committed Dec 13, 2024
1 parent d8152a4 commit ff2c5f3
Showing 1 changed file with 20 additions and 1 deletion.
21 changes: 20 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ The following section gives an overview of highlighters and available methods/co

### Create a new highlighter

Creates a new highlighter instance.

```python
from txtmarker.factory import Factory
highlighter = Factory.create("pdf")
Expand Down Expand Up @@ -100,8 +102,25 @@ chunks: int
Splits queries into multiple chunks. This is designed for very long text matches.
### Page text
Extracts page text from `infile` and returns as a generator. This enables analysis on the text exactly as it will appear to the highlighter.

```python
highlighter.pages("input.pdf")
```

#### infile
```yaml
infile: string
```

Full path to input file

### Highlight text

Highlights using provided annotations. Annotated file is stored as `outfile`.

```python
highlighter.highlight("input.pdf", "output.pdf", [("name", "text to highlight")])
```
Expand All @@ -125,4 +144,4 @@ Full path to output file, i.e. the highlighted file
highlights: list of (string, string|regex)
```

List of highlight elements. Each pair has a name (can be None) and text value. The text can either be a string or a regular expression.
List of highlight elements. Each pair has a name (can be None) and text value. The text can either be a string or a regular expression. When using string matching, make sure to escape regular expressions (i.e. call `re.escape`).

0 comments on commit ff2c5f3

Please sign in to comment.