Why are some URLs listed more than once in my AI dataset even though I only added them one time?
Site URLS that have a lot of text content on them such as long privacy statements, multiple page PDF documents, FAQs lists, etc will be split up into multiple dataset records automatically. This is by design and is referred to as “chunking” and required for the AI engine to operate efficiently and return the most relevant content on large data sources.
Even though there may be multiple records in your dataset for the same source URL, you do not have to worry about the specific URL appearing multiple times in any search results.
Shop Bot Pro automatically only displays one entry for any specific URL regardless of how many records are shown in your dataset for that specific content URL
You may want to review the index content for documents that are listed multiple times and create a summary of the document and use that for the index instead of the long bulk contnet. Summarized content can help the AI engine determine the purpose of very long documents. You can also use the Search Excerpt generator feature to asssit you in creating an AI-optimized summary of your long documents.
Note: if you end up summarizing your long document, you can optionally remove any or all of the other duplicate records in order to fine-tune the AI responses regarding that content. If you need assistance optimizing large content sources for the best AI results and responses, please contact support and we will look at the content and advise.