Limit length of abstract in search results using query endpoint with hit highlighting

Hi everyone,

I’m currently using the query endpoint in our python code to deliver search results with hit highlighting to our React frontend. We want to display example text with the hit highlights in our search results table, but we are running into issues where the returned abstract is very large in some cases.
I’m wondering if there is a way with this endpoint to limit the length of the abstract (still including at least one hit highlight). I noticed in your documentation there is a parameter
options: { "abstract_size": 250 }
I’ve added that to our request, but it doesn’t change the length of the abstracts we get in our response. Are you able to use that parameter while also using the highlight:{"query": True } parameter?

Here is what the parameters we are giving in python look like:

results = squirro_client.query(
    cfg.squirro_settings.project_id,
    query=squirro_query,
    start=offset,
    highlight={"query": True},
    explain=True,
    count=limit,
    spellcheck=True,
    options={"abstract_size": 250}, #newly added but did not change the search results
    aggregations={  
        "business_unit": {
            "fields": "business_unit",
            "size": 1000,
            "method": "terms",
        },
        "classification": {
            "fields": "classification",
            "size": 1000,
            "method": "terms",
        },
        "countries": {
            "fields": "countries",
            "size": 1000,  # default = 10
            "method": "terms",
        },
        "sectors": {
            "fields": "sectors",
            "size": 1000,
            "method": "terms",
        },
        "topics": {
            "fields": "topics",
            "size": 1000,
            "method": "terms",
        },
        "content_type": {
            "fields": "content_type",
            "size": 1000,
            "method": "terms",
        },
    },
)

Example logging of the request we send to the query endpoint:

discover-backend     | 2023/01/27 15:42:00 - http.client - DEBUG   - send: b'{"query": "client", "aggregations": {"business_unit": {"fields": "business_unit", "size": 1000, "method": "terms"}, "classification": {"fields": "classification", "size": 1000, "method": "terms"}, "countries": {"fields": "countries", "size": 1000, "method": "terms"}, "sectors": {"fields": "sectors", "size": 1000, "method": "terms"}, "topics": {"fields": "topics", "size": 1000, "method": "terms"}, "content_type": {"fields": "content_type", "size": 1000, "method": "terms"}}, "start": 0, "count": 10, "highlight": {"query": true}, "options": {"abstract_size": 250}, "explain": true, "spellcheck": true}'

Example response:

{
   "count":9,
   "items":[
      {
         "id":"eb4Av2Zk5xKSMVFlcD5fig",
         "link":"https://ekdevelopment.sharepoint.com/sites/MubadalaTest/Enterprise%20Research%20and%20Studies/CowenandCompany_InitiateOutperformAutomatingDominating_Sep_16_2020.pdf",
         "read":true,
         "starred":false,
         "oversize":false,
         "title":"CowenandCompany_InitiateOutperformAutomatingDominating_Sep_16_2020",
         "abstract":[
            "Software PAYCOM SOFTWARE EQUITY RESEARCH INITIATING COVERAGE September 16, 2020 Price: $280.09 (09/15/2020 ) Price Target: $325.00 OUTPERFORM (1) INITIATE OUTPERFORM: AUTOMATING & DOMINATING Bryan C. Bergin, CFA 646 562 1369 bryan.bergin@cowen.com Jared Levine, CFA 646 562 1431 jared.levine@cowen.com Zachary Ajzenman 646 562 1363 zachary.ajzenman@cowen.com Key Data Symbol NYSE: PAYC 52-Week Range: $342.00 - $163.42 Market Cap: $16.4B Net Debt (MM): $(72.3) Cash/Share: $2.32 Dil. Shares Out (MM): 58.5 Enterprise Value (MM): $16,312.7 ROIC: NA ROE (LTM): NA BV/Share: $10.17 Dividend: NA FY (Dec) 2019A 2020E 2021E 2022E Revenue (MM) Q1 $200.0 $242.0A - - Q2 $169.0 $182.0A - - Q3 $175.0 $192.0 - - Q4 $193.0 $212.0 - - Year $738.0 $828.0 $1,000.0 $1,225.0 EV/S 22.1x 19.7x 16.3x 13.3x Consensus Rev$738.0 $828.0 $988.0 $1,202.0 Consensus source: Thomson Reuters EBITDA Year $317.9 $311.8 $396.4 $495.1 EV/ EBITDA 51.3x 52.3x 41.2x 32.9x Consensus EBITDA$318.00 $311.00 $386.00 $482.00 Consensus source: Thomson Reuters EPS Year $3.51 $3.21 $4.16 $5.23 P/E 79.8x 87.3x 67.3x 53.6x THE COWEN INSIGHT Initiate PAYC at Outperform & $325 PT. Platform innovation w/ a highly disciplined sales strategy & branding is changing HCM buying criteria with an emphasis on employee usage & measurable ROI. Robust unit additions, 20%+ sales productivity & underappreciated TAM will drive growth recovery & share gains from FY21 (+21% 3-year CAGR). Leading growth & profitability in HCM + SaaS deserves a premium. Help Yourself: PAYC Boasts a Differentiated Value Proposition PAYC's innovative HCM product development, with a highly disciplined sales strategy and strong messaging is changing buying criteria. It's driving a greater focus on employee self-service & proper usage to directly enable <squirro:highlight>client</squirro:highlight> cost reduction and is among the only providers with the proprietary tools to quantify the benefits via DDX. An intuitive product and its GTM model has translated to robust unit additions and 20%+ y/y sales productivity, and we see a recovery in growth from FY21, forecasting a +21% 3-year CAGR. Its differentiated value proposition was accelerating pre-COVID evidenced by a notable uptick in January 2020 lead volume (+600% y/y) and the rapid recovery to pre-COVID pipeline & bookings levels merely 2 months after the April/May trough demonstrates its durability and differentiated proposition. A material projected ramp in 2H20 S&M spend (3Q +30% y/y) and ample sales team headroom give us confidence in strong growth recovery to drive upside. Cowen SMB Payroll Survey Reveals Underappreciated TAM Opportunity Our proprietary SMB Payroll Survey supports significant further cloud payroll penetration opportunity, and we believe down-market unit growth is an underappreciated catalyst. PAYC’s innovation & branding amid increasing strategic relevance of intuitive HCM tech, changing workforce demographics, and nascent ~5% TAM penetration provide ample runway to sustain strong growth. We project ~207k businesses with 20-99 employees still utilizing legacy payroll methods that are ripe for cloud conversion. PAYC would require merely 6.5% of this opportunity to double its reported <squirro:highlight>client</squirro:highlight> count from FY19. Valuation: Premium Growth + Profitability Across HCM & SaaS We apply an 18.8x EV/S multiple on CY21E to arrive at our $325 PT. Premium is warranted based on PAYC’s robust growth & profitability and our view of a strong reacceleration & expanding market share. PAYC's industry-leading growth & EBITDA – not just in HCM, but across SaaS investments – has consistently exceeded 70% in recent years. However, PAYC is trading at over a 30% discount on EV/S relative to high-growth SaaS comps that have near-term COVID-19 tailwinds that likely dissipate. We believe PAYC’s impressive balance of high growth & profitability should receive greater appreciation as a more normalized macro environment materializes. COWEN.COMPlease see pages 38 to 42 of this report for important disclosures."
         ],
         "body":"<html><body><div class=\"page\"><p/>\n<p>Software\n</p>\n<p>PAYCOM SOFTWARE\n</p>\n<p>EQUITY RESEARCH INITIATING COVERAGE\n</p>\n<p>September 16, 2020\n</p>\n<p>Price: $280.09 (09/15/2020 )\nPrice Target: $325.00\n</p>\n<p>OUTPERFORM (1)\n</p>\n<p>INITIATE OUTPERFORM: AUTOMATING &amp;\nDOMINATING\n</p>\n<p>Bryan C. Bergin, CFA\n646 562 1369\nbryan.bergin@cowen.com\n</p>\n<p>Jared Levine, CFA\n646 562 1431\njared.levine@cowen.com\n</p>\n<p>Zachary Ajzenman\n646 562 1363\nzachary.ajzenman@cowen.com\n</p>\n<p>Key Data\nSymbol NYSE: PAYC\n</p>\n<p>52-Week Range: $342.00 - $163.42\n</p>\n<p>Market Cap: $16.4B\n</p>\n<p>Net Debt (MM): $(72.3)\n</p>\n<p>Cash/Share: $2.32\n</p>\n<p>Dil. Shares Out (MM): 58.5\n</p>\n<p>Enterprise Value (MM): $16,312.7\n</p>\n<p>ROIC: NA\n</p>\n<p>ROE (LTM): NA\n</p>\n<p>BV/Share: $10.17\n</p>\n<p>Dividend: NA\n</p>\n<p>FY (Dec) 2019A 2020E 2021E 2022E\nRevenue (MM)\n</p>\n<p>Q1 $200.0 $242.0A - -\n</p>\n<p>Q2 $169.0 $182.0A - -\n</p>\n<p>Q3 $175.0 $192.0 - -\n</p>\n<p>Q4 $193.0 $212.0 - -\n</p>\n<p>Year $738.0 $828.0 $1,000.0 $1,225.0\n</p>\n<p>EV/S 22.1x 19.7x 16.3x 13.3x\n</p>\n<p>Consensus Rev$738.0 $828.0 $988.0 $1,202.0\nConsensus source: Thomson Reuters\n</p>\n<p>EBITDA\n</p>\n<p>Year $317.9 $311.8 $396.4 $495.1\nEV/\nEBITDA 51.3x 52.3x 41.2x 32.9x\n</p>\n<p>Consensus EBITDA$318.00 $311.00 $386.00 $482.00\nConsensus source: Thomson Reuters\n</p>\n<p>EPS\n</p>\n<p>Year $3.51 $3.21 $4.16 $5.23\n</p>\n<p>P/E 79.8x 87.3x 67.3x 53.6x\n</p>\n<p>THE COWEN INSIGHT\nInitiate PAYC at Outperform &amp; $325 PT. Platform innovation w/ a highly disciplined sales\nstrategy &amp; branding is changing HCM buying criteria with an emphasis on employee usage\n&amp; measurable ROI. Robust unit additions, 20%+ sales productivity &amp; underappreciated TAM\nwill drive growth recovery &amp; share gains from FY21 (+21% 3-year CAGR). Leading growth &amp;\nprofitability in HCM + SaaS deserves a premium.\n</p>\n<p>Help Yourself: PAYC Boasts a Differentiated Value Proposition\nPAYC\\'s innovative HCM product development, with a highly disciplined sales strategy\nand strong messaging is changing buying criteria. It\\'s driving a greater focus on employee\nself-service &amp; proper usage to directly enable <squirro:highlight>client</squirro:highlight> cost reduction and is among the\nonly providers with the proprietary tools to quantify the benefits via DDX. An intuitive\nproduct and its GTM model has translated to robust unit additions and 20%+ y/y sales\nproductivity, and we see a recovery in growth from FY21, forecasting a +21% 3-year CAGR.\nIts differentiated value proposition was accelerating pre-COVID evidenced by a notable\nuptick in January 2020 lead volume (+600% y/y) and the rapid recovery to pre-COVID\npipeline &amp; bookings levels merely 2 months after the April/May trough demonstrates its\ndurability and differentiated proposition. A material projected ramp in 2H20 S&amp;M spend (3Q\n+30% y/y) and ample sales team headroom give us confidence in strong growth recovery to\ndrive upside.\n</p>\n<p>Cowen SMB Payroll Survey Reveals Underappreciated TAM Opportunity\nOur proprietary SMB Payroll Survey supports significant further cloud payroll penetration\nopportunity, and we believe down-market unit growth is an underappreciated catalyst.\nPAYC’s innovation &amp; branding amid increasing strategic relevance of intuitive HCM tech,\nchanging workforce demographics, and nascent ~5% TAM penetration provide ample\nrunway to sustain strong growth. We project ~207k businesses with 20-99 employees still\nutilizing legacy payroll methods that are ripe for cloud conversion. PAYC would require\nmerely 6.5% of this opportunity to double its reported <squirro:highlight>client</squirro:highlight> count from FY19.\n</p>\n<p>Valuation: Premium Growth + Profitability Across HCM &amp; SaaS\nWe apply an 18.8x EV/S multiple on CY21E to arrive at our $325 PT. Premium is warranted\nbased on PAYC’s robust growth &amp; profitability and our view of a strong reacceleration &amp;\nexpanding market share. PAYC\\'s industry-leading growth &amp; EBITDA – not just in HCM, but\nacross SaaS investments – has consistently exceeded 70% in recent years. However, PAYC\nis trading at over a 30% discount on EV/S relative to high-growth SaaS comps that have\nnear-term COVID-19 tailwinds that likely dissipate. We believe PAYC’s impressive balance\nof high growth &amp; profitability should receive greater appreciation as a more normalized\nmacro environment materializes.\n</p>\n<p>COWEN.COMPlease see pages 38 to 42 of this report for important disclosures.</p>\n<p/>\n</div>\n</body></html>",
         "language":"en",
         "created_at":"2023-01-23T18:17:41",
         "location":"None",
         "modified_at":"2023-01-23T18:17:50",
         "score":"None",
         "keywords":{
            "sectors":[
               "Food & Staples Retailing",
               "Advertising",
               "Software",
               "Materials",
               "Health Care Services"
            ],
            "sharepoint_id":[
               "fb34f480-cd82-4b60-8a3b-9fddcef7ed50"
            ],
            "contact_name":[
               "Kate Erfle"
            ],
            "topics":[
               "Financial Returns",
               "Human Capital",
               "Valuation",
               "Growth",
               "Revenue",
               "Benefits",
               "Opportunity",
               "Technology",
               "Revenue Retention",
               "Outperform",
               "Training",
               "Profitability",
               "Retention",
               "Strategy"
            ],
            "countries":[
               "United States"
            ],
            "topics_ids":[
               "564",
               "668",
               "629",
               "617",
               "666",
               "612",
               "615",
               "674",
               "562",
               "618",
               "619",
               "672",
               "669",
               "610"
            ],
            "uploaded_by_name":[
               "Kate Erfle"
            ],
            "classification":[
               "Internal"
            ],
            "country_ids":[
               "185"
            ],
            "sharepoint_parent_url":[
               "https://ekdevelopment.sharepoint.com/sites/MubadalaTest/Enterprise%20Research%20and%20Studies"
            ],
            "contact_email":[
               "kerfle@enterprise-knowledge.com"
            ],
            "classification_ids":[
               "2"
            ],
            "sharepoint_external_url":[
               "https://ekdevelopment.sharepoint.com/sites/MubadalaTest/Enterprise%20Research%20and%20Studies/CowenandCompany_InitiateOutperformAutomatingDominating_Sep_16_2020.pdf"
            ],
            "content_type":[
               "Research Item"
            ],
            "publication_date":[
               "2022-09-27T20:26:24.397000+00:00"
            ],
            "sector_ids":[
               "397",
               "469",
               "370",
               "257",
               "256",
               "443"
            ],
            "uploaded_by_email":[
               "kerfle@ekdevelopment.onmicrosoft.com"
            ]
         },
         "external_id":"81b3e984e1a3f8fe9660666050927a232f1d40ec52b0e5c4b76d1e57c98be464",
         "explanation":{
            "matches":{
               "body":[
                  {
                     "term":"client",
                     "score":3.0719264
                  }
               ],
               "body.stemmed":[
                  {
                     "term":"client",
                     "score":2.5569448
                  }
               ]
            }
         },
         "related_items":[
            
         ],
         "has_sub_items":true,
         "files":[
            {
               "mime_type":"application/pdf",
               "id":"j54TIPg6RVi8ssXT5ns-eg",
               "representations":{
                  "application/pdf":"/storage/localfile/70/0d/a43e92730e153a8f/3-CowenandCompany_InitiateOutperformAutomatingDominating_Sep_16_2020.pdf"
               },
               "link":"/storage/localfile/70/0d/a43e92730e153a8f/3-CowenandCompany_InitiateOutperformAutomatingDominating_Sep_16_2020.pdf"
            }
         ],
         "has_matching_sub_items":true,
         "communities":[
            
         ],
         "sources":[
            {
               "id":"RBeKeratREi4uPp9uM97Lw",
               "title":"Discover File Upload API"
            }
         ]
      },
      ...
    ...
   "time_ms":93
}

As you can see, the abstract is more than 250 characters, do you know of a built-in way that we can limit this?

Thank you,
Kate

1 Like

Hi Kate,

Thanks for reaching out to the Squirro Forum!
Could you please confirm what version of Squirro you are using ?

This seems like a bug that has been resolved in recent versions of Squirro.
As a temporary workaround, can you please add the following to the query method

fields = ["abstract"]

Should this still be an issue, please let us know :slight_smile:

Thanks and regards,
Peter Brejza

1 Like

Hi Peter,

Thank you for the reply! We are using Squirro v3.6.5.

Adding the field fields = ["abstract", ...] did provide shorter abstracts and we no longer get any very long excerpts. We are getting some very short ones though. Here is an example of the abstract list we get for one of our search items:

'abstract': ['and <squirro:highlight>Company</squirro:highlight>.', 'At the close of 2019, PAYC served 13,581 clients (based on parent <squirro:highlight>company</squirro:highlight> grouping).', 'html> Figure 8 Ramping Investments In Advertising Source: <squirro:highlight>Company</squirro:highlight>', 'Reports, Cowen and <squirro:highlight>Company</squirro:highlight> Limited Market Share Penetration Supports Long-Term Growth', 'Figure 9 Limited Unit Penetration Of Addressable Market Source: <squirro:highlight>Company</squirro:highlight> Reports, Cowen']

I was originally taking the first item in the abstract list and using that as our excerpt, but in this case, the excerpt is just and Company, which is not ideal to display in search results. Do we have a way of controlling a minimum length? Or do you have a suggestion of how to handle this? Right now, I’m thinking I could take the longest excerpt from the abstract list and use that, but do you have any other suggestions?

Thank you and have a great evening!
Kate Erfle

1 Like

Hi Kate,

Thanks for your reply!
Currently we do not support a minimum abstract size when utilising the query method.

For such a reason, I would recommend going with your proposed approach to take the longest excerpt from the abstract list. Alternatively you could also merge a few abstracts together to create a meaningful excerpt.

Should you have any others questions, please let me know.

Thanks and regards,
Peter Brejza

Hi Peter,

I’ve ended up taking the longest excerpt in each list and that’s looking a lot better. I’ll let you know if any other questions come up, thanks for all of your help!

Best,
Kate Erfle

1 Like