pdf-text-reader

Exctracts metadata, text, and styling from a pdf-file

Installation

npm i @vtfk/pdf-text-reader

Usage

(async () => {
  const pdfReader = require('./index')
  
  const pdfPath = './data/examplePdf.pdf'
  try {
    const pdfData = await pdfReader(pdfPath)
    console.log(pdfData)
  } catch (error) {
    console.log(error)
  }
})()

returns:

{
    metadata: {'some metadata about pdf'},
    textContent: [
        {
            str: 'Hello',
            dir: 'ltr',
            width: 6.87792,
            height: 11.04,
            transform: [
                8.87003171296,
                0,
                0,
                9.00064,
                87.8488,
                401.468
            ],
            fontName: 'g_d1_f1',
            page: 1
        },
        ...
    }
}

Options

InferLines

normalizeY is set to 300 by default

(async () => {
  const pdfReader = require('./index')
  
  const pdfPath = './data/examplePdf.pdf'
  try {
    const options = {
        inferLines: {
            normalizeY: true // can be true, a number, false, or undefined
        }
    }
    const pdfData = await pdfReader(pdfPath, options)
    console.log(pdfData)
  } catch (error) {
    console.log(error)
  }
})()

returns:

{
    metadata: {'some metadata about pdf'},
    textContent: [
        {
            str: 'Hello',
            dir: 'ltr',
            width: 6.87792,
            height: 11.04,
            transform: [
                8.87003171296,
                0,
                0,
                9.00064,
                87.8488,
                401.468
            ],
            fontName: 'g_d1_f1',
            page: 1
        },
        ...
    },
    lines: {
        '300': 'Hello , this is the first l ine'
    }
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
data		data
lib		lib
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.js		example.js
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf-text-reader

Installation

Usage

Options

InferLines

About

Releases 8

Packages

Contributors 3

Languages

License

vtfk/pdf-text-reader

Folders and files

Latest commit

History

Repository files navigation

pdf-text-reader

Installation

Usage

Options

InferLines

About

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 3

Languages

Packages