Html tags in squirro item body

in older version of squirro I saw squirro was adding
html tags in item body <html><body> actual body content </body></html>

but in newer version I do not see html tags

can someone explain , what squirro do with item body

we have only Content Standardization step rest are our custom steps in pipeline workflow

Welcome Adarsh! Great to see you here!

Hi @adarsh

That markup would likely come from the “Sanitize HTML” step in the pipeline.


Hi @adarsh,

As @pneff has already mentioned, this is controlled by the Sanitize HTML step. This step was introduced in v3.3.3 of Squirro, along with the Remove HTML step. Internally, they were extracted from the Content Standardization step. Therefore, you need to add those 2 steps in your workflow in order to have the same behaviour as before.


According to documentation I researched 3.3.9 was the latest customer/partner supported release (LTS) where this feature was available.

  • Introduce three new content processing steps, which replace the previous Content Standardization pipeline step:
    • Sanitize HTML: cleans HTML document of potential malicious HTML tags
    • Remove HTML: removes HTML from fields
    • Content Standardization: makes sure that the item as the correct structure to get indexed“
      3.3.9 documentation