I think I understand your concern but if I miss the point, please follow up! So ...

BillFranklin · on Feb 22, 2024

> It is what it is sadly.

This is what I mean -- previously I built a similar search engine on top of slack, notion, etc., but didn't launch the product because I thought that requiring users to constantly add bots to private channels would be a subpar experience. I thought this would be a blocker for good UX, so didn't go further, but maybe you'll find a nice solution!

Searching over public internal data is addressed by a few existing tools, but it's the private aspect which is pretty difficult to handle and disastrous to get wrong when managed ad-hoc - e.g. someone accidentally adds the bot to a private slack group called #layoffs :) so you'd want this handled properly and centrally.

I guess you'll also need to handle privacy well, ~maybe it's OK when run as a SaaS for db admins to have access to ingested data, but if it's OSS then the people that run it probably shouldn't be able to read the private data that's ingested, so now you need to handle search over encrypted data, which is a fun problem :D

yuhongsun · on March 2, 2024

Access controls is a non-glamorous but critical piece of what we're building. Currently implementing automatic access sync-ing for a few sources like Google Drive, Confluence, Jira, and Notion to start. By matching document-access in the source to users and groups, and then to emails, we can finally map Danswer users to document level access. So someone searching in Danswer will only get results based on the set of documents they have access to in the source tool.

For Slack it would look something like: get the users in the Slack channel, map those Slack users to users in Danswer. Then only those users in Danswer will be able to get results from that channel.

nl · on Feb 22, 2024

> ~maybe it's OK when run as a SaaS for db admins to have access to ingested data, but if it's OSS then the people that run it probably shouldn't be able to read the private data that's ingested

I don't understand the distinction here. If Danswer runs a SaaS version then yes I agree they can have a license agreement that lets their DB Admins see data in some cases which is fine. That seems an orthogonal issue to if a company is running the OSS version internally, in which case presumably their administrator can see all docs (but software administrators usually can do this anyway).

Weves · on Feb 22, 2024

Yep, this is exactly correct! For our SaaS version, we do have an agreement which allows us to look at data if needed to debug issues and/or improve search performance.

For self-hosted deployments, usually a select few admins who have setup the plumbing on AWS do have access (but as nl has mentioned, these people usually have access to superuser access on the tools we connect to anyways so this is a noop).