After I import a batch of URLs into my dataset , they do not contain any information in the AI Data "Index" field or if I enter in an individual URL on the ‘Add Site Content’ page and I get an error that says “The URL you provided is invalid. Please verify and re-enter a valid URL.”
If you can visit the URL(s) in question directly with your browser, then most likely your server or DNS provider is a using a bot-prevention system or firewall and that is blocking our automated scraper (content ingestor bot) from downloading your page’s textual content.
If you are using Cloudflare, follow the instructions below to disable the bot-protection temporarily so that you can index your content:
- Login To Your CloudFlare Account
- Select the domain on your websites list (if you have more than one domain in CoudFlare.)
- On the Left Menu, Select "Bots"
- Turn Off "Bot Fight Mode" to allow the scraper to download your content.
- When you have finished adding your URLS To Your Dataset, you can reenable The Bot Fight Mode Setting if it is required for your site.
If you are using another type of bot-prevention, you will most likely need to whitelist our IP addresses and/or our domains to access (scrape) your page content.
Indexers (Content Scrapers) IP Addresses:
- 52.53.60.15
- 54.176.22.64
Indexer | Scraper Domains:
- api.shopbotpro.com
- shopbotpro.appzonio.com
If you cannot get the automatic indexer to ingest your URL’s text content, please contact us using the “Contact Support” button at the bottom of any page in the control panel and we will let you know why your specific URL(s) are not being indexed correctly and how to resolve .