CarahCast: Podcasts on Technology in the Public Sector

Introduction to HIPAA Compliance is Software Development

Episode Summary

This podcast series is aimed at educating healthcare services and insurance companies about HIPAA compliance in the context of development and test data. Each episode focuses on key aspects of compliance, addressing technical, legal, and ethical challenges.

Episode Transcription

Nick Shuart

Hey everybody, this is Nick Schuert. I'm here with Jan Deanna and George Barroso. And today we're going to be talking about HIPAA compliance for development and test data.How you doing today, George? Good.

George Barroso

How are you?

Nick Shuart

Not too bad. Thanks for coming in.

Jan Diana

Thanks for having me.

Nick Shuart

Absolutely. Glad you could make it. So, tell us a little bit about yourself.As far as development and test data, you know, how does that play into your daily life? How long have you been in this industry?

George Barroso

So I've been dealing with development and test data for well over 10 years. We handle, you know, de-identification, masking, redaction, you know, whatever some folks call obfuscation. But essentially, we've been doing that for our clients for 15 years or so, specifically around HIPAA and healthcare.We've been doing that for clients since 10 plus years now. And I think that was our first healthcare client. So you could say we have a lot of experience in helping them make sure that they have HIPAA compliant data and they're in the development and test environments.

Nick Shuart

Very cool. Very cool. Obfuscation, by the way.That's a new word for me. I got to write it down.

George Barroso

Oh, yeah. There's all sorts. They come up with one every year.

Nick Shuart

Yeah, they do. Yeah. We were just talking yesterday about interoperability for everybody.You know, it's like, okay, synergy would work pretty well, too. That's right. But another thing you mentioned was HIPAA.What exactly is HIPAA? Why is it important to software developers?

George Barroso

So HIPAA stands for the Health Insurance Portability and Accountability Act. It's basically a law that protects patients' medical data, you know, or protected health information. That's abbreviated as PHI, so you hear a lot of people talk about PHI.That stands for protected health information. And it's very important because if you're doing development, you know, there are very strict regulations and kind of rules about how you handle the data, who's supposed to have access to the data because it's protected. And so if you're, you know, a developer at a healthcare firm or, you know, some company that's dealing with healthcare data, you do have to be very careful about how you handle that data because if it's not de-identified or, you know, masked, obfuscated, however you want to refer to it, if it's actual data, then there are very strict rules about who's supposed to see it and how you handle it.So that's why it's important to consider because if you violate those, then you have legal and financial issues.

Nick Shuart

And nobody wants those.

George Barroso

Nobody wants those.

Nick Shuart

Not in this day and age.

Jan Diana

Of course. To follow up with that, of course, what risks are associated with using real PHI in developing environments?

George Barroso

Well, so, you know, in production, you always have, everything's very locked down. And so you're going to have your real health information in those systems. When you get to development and testing, they're not going to put the same controls around those systems.And many more people are going to have access to it, right? A production system might be a bunch of folks in a medical office that can see and use the system. But if you're a developer, you know, they might send some of that development offshore.They could send it to, you know, a third party here in the United States. But there's going to be way more, a lot more people accessing the data and looking at it. And so if it's not protected, encrypted, obfuscated, de-identified, all of that PHI is in the clear.Right? And that means that all of those developers, you know, have access to it. They can see people's names and addresses, potentially.Some sensitive health information could be there. And so it's a big risk for companies to have these development and testing systems that are not protected.

Jan Diana

And that makes sense.

George Barroso

And then that opens up and increases your risk, right? Because in production, maybe you have a few systems where people access it. You maybe have 100 people accessing it if you're a big developer.But in development, you might have 1,000 developers. That's 10 times the risk if you just look at pure numbers.

Nick Shuart

That's a good point. I didn't think about that at all. That really opens up the floodgates, potentially.Correct. Okay. So, are there methods to help de-identify PHI?

George Barroso

There are. And actually, HIPAA specifies two accepted approaches for protecting PHI and using it outside of the production scenario. One is called the Safe Harbor Method, and we'll talk about what that is probably in a few minutes here.And the other one is expert determination. Both of them are accepted, and they're actually in the HIPAA regulations, too. But those are the accepted methods for handling data that's going to be used in development and testing.

Jan Diana

Interesting. Okay. Awesome.So, you did mention one of the two methods would be the Safe Harbor Method? Yes. What exactly is it when it comes to the HIPAA compliance?

George Barroso

So, the Safe Harbor Method is essentially a strategy for making sure you can de-identify your dataset. And at HIPAA, the regulations lay out the 18 identifiers that are required to be de-identified or protected, treated. Things like names, addresses, technically, it's geographic data smaller than a state.And so, that means your street, your city, and potentially some parts of the zip code as well. Right. There's some very strict rules around that.And then, social security numbers, there's medical record numbers, but there's a whole list of 18 that need to be de-identified in your dataset in order for it to be considered safe.

Jan Diana

Right. That makes sense. We don't want to just disseminate our information out into the world, so that would make sense.We want to make sure that it's safe. Yeah.

Nick Shuart

I hear data is kind of important these days. So, based off of that, you said there's 18 identifiers. Are there any specific challenges?Because you say things like social security number, addresses, things I wouldn't want out in the public. Are there any specific challenges when it comes to protecting those with the Safe Harbor Method?

George Barroso

Probably the biggest challenges with protecting those is making sure that the data is still usable afterwards.

Nick Shuart

Okay.

George Barroso

Right? It's easy to say, oh, well, that's no problem. I'll just replace it with all Xs.

Nick Shuart

Right.

George Barroso

Great. But if you've got business logic that's going to go and give you some report based on zip code, and all of a sudden it's all ones, that's going to be useless. Right.You no longer can tell whether or not that report's going to work. So the biggest challenge is maintaining utility when you're making changes like this. Okay.Right? That's the biggest kind of challenge with the Safe Harbor Method, making sure that the data, after you're done treating it, still has usability for whatever you're testing.

Nick Shuart

Got it. That makes sense. Okay.I can definitely see that. Right.

Jan Diana

Well, now that we understand what the Safe Harbor Method is, would you like to do a little bit of a deep dive into the Expert Determination Method?

George Barroso

Yeah, sure. It applies to HIPAA. So the Expert Determination Method is used generally when you want to maintain the maximum usability or utility of the data.And really what that is, is you have a qualified expert that analyzes your data set, and you might have already removed some of the protected health information, but they'll analyze the data set to make sure that whatever you're sharing out to your third parties or whatever you're using internally for development, there's no risk of looking at that data and being able to re-identify that back to the original person that it belongs to. That would be something like, you know, if it only had, let's say, my test results from my last physical, but there was no indicator of who it belonged to, like my name wasn't in the data set, my address wasn't in the data set, my medical record number wasn't in the data set.It was purely used for just, okay, here's some test results, right? And it was kind of completely disconnected. You know, the expert could say, yes, this is safe to disseminate because it doesn't actually, you can't tie it back to anybody specifically.

Nick Shuart

Okay. So I have two questions there. First, say that information, that data was sent back to a medical professional.Is there a way to re-identify? And then second, you said that there's experts that use that method. Who qualifies as an expert in that case?

George Barroso

So the expert is someone who actually, obviously, the expert prefers that. The expert is actually somebody who has enough knowledge and understanding of the data set to be able to apply the statistical and scientific methods required to prove that can re-identify it. So, you know, and that does apply to that specific data set.So if the data set changes and you add a column or you add a bunch of data, you know, the next time around, now you have to go back, the expert has to go back in and re-certify that that is still safe.

Nick Shuart

Got it.

George Barroso

To address. And then your other question was, can you re-identify the data? Right.Potentially, yes. Okay. Depending on what's in the data set, it will be up to the expert to determine whether or not that re-identification risk is enough to worry about.Or, you know, let's say, for example, there was some internal identifier or token for each record, right, that only you had or only your company had, and when you shared it out, there was no way anybody could figure out that, you know, number three belongs to me and number five belongs to you.

Nick Shuart

Gotcha.

George Barroso

Right? I could have just re-identified.

Nick Shuart

Right. Okay.

George Barroso

In that scenario, then, yes, when you, the owner of the data company, you know, gets it back after the analysis, they could go in and map those back to them.

Nick Shuart

Okay.

George Barroso

And that's a very common thing that companies will look to do when they have to basically send it out for analysis and then get the results back and kind of marry that back with their regular data to determine, you know, whatever they're trying to figure out. Re-identification is possible in those scenarios and desired in those scenarios.

Nick Shuart

But it sounds like there's good checks and balances there in that process. Yes, there has to be. Absolutely.

George Barroso

Otherwise, you know, it will be too easy to reverse. I mean, reversing it is the big problem.

Jan Diana

Right. I actually have a little bit of a compound question when it comes to that also. The expert itself, now is this a literal person?Is there some AI attached to that or can there be some AI attached to that? Could it be an automated process or is it a literal person? And then also, in addition to that, how does this differ from the safe harbor method as well?

George Barroso

So, traditionally, it has always been a person because you have to have knowledge of how the data is used, what's stored in the data, and, you know, have information about whether or not you could re-identify. I'll give you actually a real world example, not from the healthcare space. This is from retail.But I think it applies and kind of illustrates what I mean by, you know, have to have knowledge of the data. So, we were doing the identification for a retail customer. They had their membership, you know, program, which, you know, had people's names and addresses and their membership number and the things that they purchased.And you could, you know, get rewards points for all the things that you buy there and then use your reward points to buy other things. So, they had this database with all of this information in it. And they said we have to go and mask all of this data because we want to use it in development.We don't want people to know or, you know, hold personal information for our customers. So, we went through, we did all the things that they identified as being sensitive. They gave it to one of the team members who was trying to find his own record because he was also part of the rewards program.And so, everything had been de-identified, the membership number, his name, his address, you know, everything about him was de-identified. What was not de-identified was the store number, the register number, the item that was purchased, and the purchase price.

Jan Diana

Oh, wow. Okay.

George Barroso

So, you'd think, well, how would you ever get back to that?

Jan Diana

Right.

George Barroso

Right? Well, he knew that he had bought a kayak, let's say, somewhere recently, so he grabbed his receipt, right? He checked the data.And he checked all that information. Yeah, makes sense. So, he found his record, which found his, you know, de-identified version of his membership number, let's call it 123.And then he went and he could go to all the other tables because he had knowledge of the data set. He could go to all the other tables and reproduce his entire purchase history.

Nick Shuart

Very cool.

George Barroso

That was in the database.

Nick Shuart

Very interesting.

George Barroso

So, you know, then we thought, okay, well, it's this one guy who knows the data so well, he's a quote-unquote expert, right? So, you know, how would somebody else, how would you get somebody else's information?

Jan Diana

Right.

George Barroso

So, he went out on social media and did a search and found like 50 people that had posted their receipts. Oh, look at this great new purchase. I got this kayak or I got this whatever, right?So, then he started searching for theirs and he found theirs. So, he came back with his results. We realized, oh, there's this huge kind of re-identification risk because of the fact that people are now posting things like this.They don't realize it's sensitive. It's just a receipt.

Nick Shuart

Yeah.

George Barroso

It doesn't have their name on it. Right. But they were able to reverse that and get the information back.That's very cool. Right? Social engineering expert.

Jan Diana

Yeah, very cool.

George Barroso

But, so, we ended up having to go and, you know, we did some analysis with the customer and realized, okay, well, if we change the register number and we change the store number and we modify the price amount, we're good.

Jan Diana

Interesting.

George Barroso

Right? Because how many kayaks should they sell a day? You know?Yeah. And so, that was enough to where he went and got a new set of receipts from, you know, social media postings or whatever, and then after we went that extra level, he wasn't no longer able to re-identify. Oh, wow.But that's an example of expert determination, right? A normal person looking at that without any personal information, you know, doesn't have anything that makes sense, but they also didn't know that it's, you know, the register number, the store number, and the price amount was enough to uniquely identify that transaction and tie it back to a person.

Jan Diana

That makes sense. So. Very cool.

George Barroso

Yeah. That's an example of the expert. And how does it differ from Safe Harbor?Correct. Safe Harbor is rules-based. It's basically, if it's a name, you mask it.If it's a social security number, you mask it. If it's whatever, you know, if it's one of the 18 identifiers, you just go in and you change it.

Jan Diana

Right? Well done.

George Barroso

But, you know, obviously, expert determination, you have a little more flexibility because, you know, maybe the social security number isn't stored in a format that makes it obvious that it's a social.

Jan Diana

That makes sense.

George Barroso

Then maybe it's not so easy to determine, and it might be fine based on expert determination that, you know, that can be left in the clear because you can't map back to anybody. You don't even realize it's just a nine-digit number, right? I have one client, actually, that said, I don't need to mask social security numbers in my systems because our account numbers are also nine digits, and then we have a bunch of other nine-digit numbers that we're not storing, and the column isn't labeled social security number, so it's just a nine-digit number.You have no idea what it is. Right. Interesting.Right? So, for them, their impulse was not that that was enough of a risk to, you know, it, there are differences like that. Safe Harbor is, quote, unquote, safer, right?Because you're changing everything that could be sensible.

Jan Diana

Right.

George Barroso

Expert determination, though, you know, is matched from data utility, right? Because you're not changing it, but you're just making it so that, making sure that the data set can't be read.

Nick Shuart

Right. That makes sense. Okay.Interesting. So, just to be clear, I shouldn't be posting my social security number online anymore? Oh, you absolutely shouldn't.And you probably think twice about posting any receipts. Any receipts. I mean, I've got, like, two kayaks, so I was like, wait a second.I wonder. My identity has been stolen so many times. That's right.