We are always talking to our users and adding support for different formats of knowledge sources periodically.
Currently the web application supports ingestion of the following types of sources.
1. Public URLs
These are publicly available web pages (not behind a login-password). These could be single page articles describing a product, an informative blog. You can ingest as many as you like sequentially. We will soon be adding support to ingest multiple URLs simultaneously.
The KNO engine will scan tables, text and alt text in your page’s HTML code.
In case your URLs host a PDF, KNO will scan that too.
The app will throw an error if the webpages return a 40X.
2. Public Websites
These are publicly available websites with a nested structure. For example, your community page, your public product support documents, websites, newsrooms and more.
One can add a parent webpage and our engine will seamlessly scan all the child URLs found with the parent URL prefix found in the sitemap and the embedded links on the page.
For example, if you add https://www.yoursaascompany.io/help-center-articles; our engine will scan
https://www.yoursaascompany.io/help-center-articles
https://www.yoursaascompany.io/help-center-articles/how-to-get-started
https://www.yoursaascompany.io/help-center-articles/how-to-add-a-team-member
But https://app.yoursaascompany.io and https://www.yoursaascompany.io will not be scanned.
3. Documents
These include PDFs, .doc, .docx, .txt and .pages files. The files must not be password protected. You can add multiple files in one go.
We keep a limit on the file size from time to time. Currently the limit is at 20 MB per file.
Going forward we will be adding the following source types.
YouTube playlists
Users can add playlists using a simple channel name and our engine will scan all videos as sources and embed their transcripts and descriptions in our knowledge base. Now you can allow users to answer from informative video playlists meant for help and support or just general information.
Authenticated documents on cloud
Soon, you will be able to train your Assistants from Google Docs, Sheets, and Slides. Maybe throw in your Confluence pages as well.
Simply point to the URLs of the cloud docs you are comfortable with the Assistants learning from, but you don’t have the time to synthesise or code out a web page for the same.
News Feed APIs
Have a news topic you wish to search about? Simply add a keyword, date range and geography and KNO will add news articles relevant to your search.