Presented at Drataverse by Cody Wright, Co-Founder and CTO @ HyperComply
HyperComply's Goal: automate security questionnaires as completely as possible with as few errors as possible.
HyperComply now offers customers the option to apply generative AI to answer their questionnaires in addition to using existing machine learning and expert reviewers. This article summarizes Cody Wright’s speech at Drataverse 2023, outlining how HyperComply is completing security questionnaires faster and more accurately using generative AI.
“HyperComply Respond uses machine learning + expert reviewers to help companies respond to questionnaires faster. To do this we build a knowledge base of structured security information which allows you to:
“Before we talk about using Generative AI to generate answer responses, we need to understand the process of Candidate Generation: searching a customer’s knowledge base for relevant reference content. Given a query (question), the system generates a set of relevant candidates (responses) which should contain the information needed to answer the incoming question.
For HyperComply, this reference content is structured information about the customer’s security posture including: policy documents, a repository of past security questionnaires, and information ingested from partner companies like Drata.
Over the last 5 years we’ve seen the industry move from “Lexical Search” to “Semantic Search”. Lexical search uses algorithms like TF IDF to do keyword search on a database where Semantic Search tries to understand the meaning of a query and search the knowledge base on semantic terms instead of just the incoming text. For example, if a customer searched for “Is data encrypted at rest”, lexical search would easily find other results with “encryption”, but it wouldn’t understand that “How is data stored” is a strong answer due the links between data storage and storage at rest - this is where semantic search comes in.
While lexical search was relatively easy to implement using tools like Elasticsearch, Semantic search requires all knowledge base items to be converted to “embeddings” (discussed below) so we can search using a deeper understanding of the content.”
“Security information is originally encoded as natural language. Machine learning / AI systems, however, operate on “vectors” and ”embeddings”, so we need to convert our natural language data in order to search it. To do this, we use a model that is either off the shelf or custom trained to reliably convert natural language data to vectors that can be used for semantic search.
The model has a deeper understanding of the text, effectively “ranking” an item on a large number of attributes. In our instance, we use a custom fine-tuned descendent of the BERT model, which generally creates 768 dimension vectors. This means it’s basically ranking each item on 768 different “categories”. While you can use off-the-shelf models to generate embeddings, once you have sufficient domain-specific data you can get significantly better results by fine tuning the model.”
“Now that we have embeddings for each item in the knowledge base we can use semantic search to find items that are the “closest” in vector space, basically finding embeddings in the knowledge base that are similar to the incoming question. We use linear algebra on the embedding vectors to find items that “rank” similarly on each category the model is ranking an item on.
The above diagram shows how this math works out in practice on common questions from security questionnaires. As you can see, cosine distance calculations are used to calculate “similarity”: 1 is a perfect match, 0 is unrelated. You can see that the top two questions about data encryption are marked as reasonably similar (although still below the threshold we’d use as “similar” enough for autocomplete purposes) where they’re very different from the bottom questions regarding various names. We can use these similarity scores to rank “Semantic Similarity” between items, allowing us to automatically determine relevance.
We use vector databases to perform this really quickly per query over a customer’s knowledge base for each incoming question.”
“Before going any further, let’s define generative AI and outline what it means in this situation specifically.
As you can see above, “Generative AI is a type of artificial intelligence (AI) system capable of generating text, images, or other media in response to prompts. Generative AI models learn the patterns and structure of their input training data, and then generate new data that has similar characteristics.” (https://en.wikipedia.org/wiki/Generative_artificial_intelligence)
For our purposes, we’re focused on feeding in relevant knowledge content via the “prompt” and using that to generate a new answer based on previous answers (or no answer if the knowledge is insufficient).”
“First, we need to generate candidates. To do this we use our semantic search from before to generate candidates and then filter down to a threshold where we know we get only sufficiently similar items.
Next, we generate a prompt using a combination of the above candidates and static instructions to tell the model what we want and the knowledge items it should use to generate its response. Instructions must be very specific to keep the model on track.
Finally, we get to response generation. The generative AI model takes the input prompt and generates a response that answers the incoming question.”
“Because this is Drataverse, and because it’s important, we can’t skip the privacy and compliance concerns here!
It’s important to note that this space is moving extremely quickly, so what’s true today could very well change tomorrow.
Public models are generally “free” with the purpose of collecting a large quantity of data, but this can often be opted out of. It is very important to review the data usage policies for these models if you are working with any kind of private data.
HyperComply has opted out of all data collection for training. Our current model here is to have customers opt into Open AI use, and we provide them with explicit links to the open AI data usage policy as well as an attestation that we are opted out of all data collection for training.”
“Today you can opt out of data tracking for training purposes for both public and private Open AI interfaces (ChatGPT + direct API usage). As of writing, Chat GPT is opted into data collection by default and the paid API is opted out by default.
An argument can be made that you should allow Open AI to record your data for training so their models improve at your use case over time. In our case this isn’t necessary because to improve responses over time we don’t need to train the model, we instead are improving on the prompt generation internally.”
“Our goal with this project is to automate security questionnaires as completely as possible with as few errors as possible.
Currently, we use expert reviewers to review automated responses to ensure best-in-class accuracy, but customers often do not want external contractors touching their security data. At the same time, customers don’t want to do this work manually either.
To get to the point where we can remove reviews by our experts, performance needs to be as close to human as possible - at least 80% completion with 80% accuracy. As a customer’s knowledge base grows over time, our completion rate will naturally rise as we have a better picture of their security posture, so we weigh accuracy over completion rate as our gold standard. ”
“I mentioned this earlier when we talked about Semantic Search, but before we can do anything, we need to collect as much data as possible to feed into the generative AI prompt. To do this we bring in documents from partners like Drata automatically, and the customer uploads any past questionnaires and useful reference documentation during onboarding.”
“ Next, we take everything from the last slide and generate embeddings. Short, fact-based items such as Q&A data and control data are embedded directly. Document items are split up into smaller chunks and those are embedded to provide more direct facts when searching instead of full document contents.
Remember that embeddings let us search for similar items, not just exact matches in queries. This is especially important for document content where an answer is not often structured in the same way that a question is answered.”
“ Now we can start generating answers to questions! A customer can upload any format of the questionnaire into HyperComply and we will convert the messy information into structured question/answer information for you.
From there, we attempt to autocomplete each question by going through all the steps we have discussed: generating new embeddings, finding semantic similarities across the knowledge base, generating a prompt for the generative AI model, sending that prompt to the model, and generating a result as an answer.”
“ And so how is this working out for HyperComply? What results are we seeing?
It’s first worth noting that the average customer has a “developing” KB - the more data we have the higher we get. Customers who have used us for a year or more often regularly get close to 100% completion (completion = attempt rate + accuracy rate).
With the previous gen model, we had to hold a very high confidence threshold when generating candidates to avoid false positives. Autocomplete is only valuable to our customers if they can trust that the answer is right in the vast majority of instances. Since we were only ever directly using previous answers, this meant that we could automatically attempt a relatively small number of answers, leaving a lot of work to the human experts.With GPT, we are able to “attempt”, aka. “take a swing at”, many more questions with the same or better accuracy results. This is because the generative model allows us to both use subsets of previous answers, as well as combine multiple responses into a single answer. We’re also able to use document contents much more effectively.
This has allowed us to increase our automated attempt rate without sacrificing accuracy, increasing our automated response rate by 150% while holding accuracy constant. We’re just getting started here, the machine learning attempt and accuracy rates can be improved by continuous improvement to the embedding model as well as improvements in our prompt structure.
Overall, we are thrilled with these results and look forward to iterating on this with our customers to speed up the questionnaire response process for them as much as possible.”
Q. Do you have a standard set of questions people get? Or a standard questionnaire you get people to answer?
A. If a customer doesn’t have much material to seed their knowledge base during onboarding or hasn't completed that many questionnaires yet, we will get them to fill out a SIG or CAIQ lite to add to their knowledge base (we license both). We also have something called a “Security Profile”, which is based on ~150 questions that are standard across most questionnaires, we automate the completion of this profile during onboarding based on whatever material the customer can give us. They can then keep this information up to date and use it as their central source of truth for answering questionnaires.
Q. What do you do if different teams have different answers? For example, if a product team would answer a question differently than a development team?
A. HyperComply offers first-class support for complicated organizations or products structured via “Knowledge Segmentation”. For example, when you upload a document you can assign it to a segment such as a specific product line or a region, then when you submit a questionnaire to HyperComply you can tag it with the same segment so that we will only use information from that segment to complete the questionnaire. When it comes to completing a questionnaire in collaboration with other teams, we offer integrations like Slack, Salesforce, and Teams to make collaboration easy.
In terms of how to handle teams that might use different language in their responses, that’s a place where HyperComply can really help. Generally, our customers only want to respond in a single “voice” to questionnaires to keep their outward-facing security posture as consistent as possible. HyperComply helps customers achieve this by automatically detecting similar results and only using the most common answers to keep things consistent.
Q. How do you determine the accuracy of answers?
A. Accuracy is determined by the “Acceptance Rate”, aka - how often are the answers HyperComply gave accepted without being changed at all? We consider a “change” anything as small as adding a comma.
Q. Can you set proactive reminders for people to update the information in their account (knowledge base)?
A. Yes! You can also auto-expire information. So for example, when you upload your SOC2 report, you can set the document to expire when that SOC2 does and prompt you to upload the new version. We also have scheduled reminders on your Security Profile so you can automatically ping subject matter experts on whatever cadence you think is appropriate to ensure your core knowledge is up to date.
Q. Where do you see human error the most?
A. Where human error affects the tool the most isn’t with our expert reviewers actually, but instead is with inaccurate or expired data seeding the knowledge base. If someone puts wrong or expired information in, then the accuracy rates will be lower. To avoid this, as part of our onboarding process, your HyperComply CSM will walk you through knowledge base hygiene and setup. We will also run automated checks on your knowledge base to detect inconsistent answers so you can keep everything consistent from the start.
Have a question for our team? Email hello@hypercomply.com and we’re happy to get it answered.