English Article · Software

Sitecore Stream for PDXP – Content Extraction

In the new release, Sitecore introduced a brand-new feature – Content Extraction that can help extract data from various input sources, including files, URLs, or plain text. The latest package is version 1.3.23.

In my demo I’ll have 2 parts: preparation and execution.

Preparation

Let’s say we have a store of Powerful bars/gels, there can be lots of different products with different flavors, usage, package sizes, and so on. Hence for this kind of items (products) a template is needed:

Then we need a document, as for now it’s supported the next file formats: PDF, TXT, JPG, JPEG, PNG.

As an example, please refer to the following screenshot; in my case, it’s a PDF document. In real life, you might have a multipage document, but just be aware that the limit is 3MB and the maximum number of pages should be up to 30.

Continue reading “Sitecore Stream for PDXP – Content Extraction”