A transformer that uses the Mozilla Readability library to extract the main content from a web page.

const loader = new CheerioWebBaseLoader("https://example.com/article");
const docs = await loader.load();

const splitter = new RecursiveCharacterTextSplitter({
maxCharacterCount: 5000,
});
const transformer = new MozillaReadabilityTransformer();

// The sequence processes the loaded documents through the splitter and then the transformer.
const sequence = splitter.pipe(transformer);

// Invoke the sequence to transform the documents into a more readable format.
const newDocuments = await sequence.invoke(docs);

console.log(newDocuments);

Hierarchy (view full)

  • Toolkit
    • MozillaReadabilityTransformer

Constructors

Properties

options: Options = {}
""