How Synapse Medicine leverages Hugging Face to improve medication safety

September 23, 2022
During this webinar, Cynthia Périer, Data Scientist at Synapse Medicine and Julien Simon, Chief Evangelist at Hugging Face discuss how Synapse Medicine leverages Hugging Face to improve medication safety. ⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️ Santé.fr is the website owned by the French Ministry of Health and especially the French Agency for Digital Health. Its goal is to give French citizens access to all healthcare information in France. It plays a major role in prevention and enables access to reliable, transparent and accessible health information. The French Agency for Digital Health wanted to provide accurate information about medications to Santé.fr users. They have reached out to Synapse Medicine to integrate its medication chatbot, named Galien, on the Santé.fr website. Synapse Medicine used Hugging Face machine learning models during the chatbot development. Save your spot for this live webinar to learn: The impact of Transformers on health tech use cases How Synapse Medicine leverages Hugging Face How to perform domain-adaptive pretraining on a transformer model to improve its comprehension of pharmaco-medical data How we fine-tuned the resulting model on a supervised intent classification task Can't attend the live broadcast? Go ahead and register anyway — we'll be sure to send a webinar recording to all registrants. Speakers Cynthia Périer - Data Scientist at Synapse Medicine Julien Simon - Chief Evangelist at Hugging Face About Synapse Medicine Synapse Medicine’s mission is to provide everyone access to the best medicine. The startup, which collaborates closely with the largest French university hospitals, has developed a Medication Intelligence platform dedicated to proper drug use. As a leader in its category, the solution is 100% independent from the pharmaceutical industry and is used today by thousands of healthcare professionals. About Hugging Face Hugging Face is a wildly popular community-based repository for open-source ML technology. It is a platform that stores, serves and manages the latest and greatest in open-sources ML models, including enabling customers to fine-tune these models and deploy them at scale. Hugging Face is one of the most used platforms and is empowering 10,000 companies to integrate artificial intelligence into their products or workflows.

Transcript

Okay, I think we can get started. There will be a few more people joining, but it's time to tell you about Synapse and Hugging Face. So once again, welcome everybody. Thank you so much for joining us this morning. It's really cool to see that we have people from all over the world joining us. I appreciate it. I hope you will learn quite a few things. My name is Julien. I'm the chief evangelist for Hugging Face. I'm very happy to be joined this morning by Cynthia from Synapse. Cynthia, can you please introduce yourself? Hi, everyone. Thanks to Julien. I'm Cynthia Perrier. I've joined Synapse Medicine as a data scientist for more than two years. Before that, I did a PhD thesis in a mathematics lab specialized in models for oncology, and I've worked in technology transfer as well. Here at Synapse, I'm working on many projects, such as the one we are going to talk about today, which is a project with the French Ministry of Health and their website, santé.fr. Awesome. We can elaborate later. Let's get started. Cynthia, can you tell us a little bit about Synapse Medicine and the mission you set for yourselves? The mission of Synapse Medicine is to provide reliable and useful information about drugs for everyone. The project was founded in 2017 by two medical doctors specialized in public health and pharmacology, and one engineer. Since then, we've developed several solutions for hospital health systems and even the government regarding healthcare technology for pharmacovigilance, care coordination, etc. We've also developed several COVID-19 initiatives since the beginning of the pandemic. The most famous one is now technology called the medication shield, which helped the French monitor adverse drug reactions from the COVID-19 vaccination campaign. We work with health tech companies, hospitals, government, and institutions. For a few numbers, we now work with seven countries, including the USA and Japan, several hospitals specialized in cancer, and others. We are happy to have raised 23 million dollars last March for a total of 40 million raised since the beginning of Synapse Medicine to scale our technology in France and abroad. We are based in Bordeaux, though we now have a workplace in Paris and soon in New York. We also have a beginning of a workplace in Japan, in Tokyo. Are you hiring? For now, I'm not sure we are hiring, but keep in touch. If you have a great resume, you can still send it. Stay in Bordeaux; it's the best. It's a really nice city. Let me tell you a few quick things about Hugging Face. This session is really about Synapse Medicine, but as we will see, Synapse has used Hugging Face models to build a really amazing application. If you've never heard about Hugging Face, we try to build the best machine learning community out there. If you haven't visited the Hugging Face hub at huggingface.co, I would absolutely encourage you to do that today. You can sign up in a minute and get access to over 70,000 pre-trained models, mostly transformer models, but also models from additional libraries we support. These models let you work on a variety of tasks from natural language processing to computer vision, audio, and speech. We even launched a video in the Transformers library just yesterday or the day before, and a few more things. Chances are you can find a pre-trained model that works for your use case. We also have close to 10,000 datasets. I checked this morning, and I think it's 9,960. So probably by tonight, it will be 10,000. All open source, letting you quickly experiment with your machine learning problems and fine-tune models. We have over 10,000 organizations, including companies, research labs, startups, enterprises, and individuals, using the hub to build machine learning models, applications, and share models, datasets, and applications with the machine learning community. Huggingface.co is the place to go to get started easily. Enough about Hugging Face. I want to hear about the great application you've built, Cynthia. What's the problem you're trying to solve here? Reliable and useful information on drugs does exist, but it can be difficult to access for the general public and medical professionals. At Synapse, we started a few years ago to offer an assistant tool to enter natural language questions on drugs, such as "Can I give this drug to a pregnant woman?" or "Is nausea a non-adverse effect for this kind of molecule?" We started to build a model to understand and answer these questions in natural language. The French health national agency then asked us if we could provide such a tool for the general public and integrate it into their official website, which provides French citizens with access to all healthcare information in France. If you live in France, you've probably visited this website. For our other viewers, it's the main official website where anyone can go to get healthcare information and now healthcare answers. Yes, official, clear, and reliable healthcare information. You have hundreds of thousands of pieces of information, more than 7,000 articles, research studies, clinical trials, content, etc. On this website, you can see the little pills icon on the right. It's in French, so you can only talk with Gallien in time. Let's try a simple question: "Quelle est la posologie de Dafalgan?" So, let's say it contains paracetamol. Here, you ask the question. The answer is, "I understand you're looking for the dosage, so how much can you take?" The dosage is one pill at a time, every four hours. You can click on "See the long answer," which comes from the official leaflet you can find in your pill bottles. You can also have the short answer. The source is very important, and you can go to the official source. It's a trusted website, and you get a trusted source, which is a huge problem for health information on the net. Should we try another one? Yes, because this one is easy. Let's try something more difficult, like, "Can I take ibuprofen if I'm pregnant?" Okay, this is more difficult because there is more context. The answer is, "I understand you want information on this drug, Neurofencaps, which contains ibuprofen, in the context of pregnancy." You asked for a short answer. This time, I didn't use the word "pregnancy." I just said, "I'm having a baby." You get an answer that is more detailed. This one is trickier than matching keywords and regular expressions because there are so many ways to say "I'm pregnant." We need something smarter to understand what's going on. Should we try the last one? "Does DEPAQUINE have an impact on your quality of life?" This is a very fuzzy question. Let's try it. The answer is, "You're looking for maybe adverse effects of the drug, which is the underlying intent. Probably, you want to know if something bad can happen if you take this." You get a very detailed answer with all the adverse effects. Everyone knows you shouldn't read this for the first time you're taking a pill, and you shouldn't self-medicate. These examples are interesting. One question that comes to mind is how it's more difficult to build something for the general public versus building something for doctors and medical professionals. I want to say maybe more difficult, but at least different. Here, you have a classification problem, an intent classification. You have a question, you want to know the intent, and provide an answer. For the general public, the questions will be different, and the answers will be different too. If the answers are simpler, you will answer fewer types of questions because patients do not want to know the pharmacodynamics or the half-life of a molecule. You don't need them. You want to provide an answer that comes from the official instructions from the bottle, not the professional instructions. It's very different because there are so many ways a doctor would state the question properly, being very precise and expecting a detailed, medical-oriented answer. The general public will ask very fuzzy, complete questions and expect an answer they can understand, even if they don't know anything about medicine. The MD questions will be simpler to analyze because they will be precise, use the correct names of the molecules, and make fewer spelling mistakes. We need something to unpack the questions and prepare understandable answers. Let's move on. Assuming machine learning is a good solution for this, the first thing to do is to build a dataset. Did you start with that or did you do it differently? Yes, you have to build a dataset. But we started with the medical doctor model. At the beginning, we didn't try a model. We just used regex patterns. For the first question, "What is the posology?" you can use regex to find the pathology and know how many pills you can prescribe to your patient. At the beginning, it was easy to use regex, provide the tool, and collect datasets of medical doctors using the assistant and asking questions. We collected the questions, checked if the answers were correct, and started to re-annotate this dataset that was pre-annotated by the regex. It was our first dataset, a semi-automatic dataset. The data were corrected by data scientists and medical doctors because here at Synapse, we have a big team of pharmacists and doctors that proofread everything we do. You want an answer you can trust. It's interesting that you combine automatic annotation and human verification with medical experts. You get the best of both worlds. We collected lots of questions, including those with spelling mistakes, and started to build on it and try to improve the answers because regex cannot be enough. Once you had a first dataset, what was the first machine learning technique you applied, and how did it go? We tried simple classical models because you should always start with simple models. We used the scikit-learn library and tried all the classification models, benchmarking them. For text analysis, we used TF-IDF, and for classification, all the classical models. In the end, the random forest made a good score, and we implemented this model. We had some metrics and the first results. In 2019, we had a small training set of 2,000 questions, discovering 20 intents, which is quite a huge number because there are all these medical questions about life and so on. We had a pretty decent F1 score of 0.94. But the score could be really high or much less interesting depending on the intent. Those metrics give you a high-level picture, but if you break them down into per-class F1 scores, you'll notice some classes are very highly recognized and understood, while others are terrible. That's not okay, especially for medication. A score of 0.84 is a very good baseline and shows you can extract information from the training set. When I was digging for information to prepare this webinar, I came across some scores for the intents, and some were really missing. It's not dangerous to answer the wrong question, but it's useless. If we don't understand the question on Santé.fr, we might show a message saying, "Sorry, I didn't understand. Please ask again." If we understand the drug, we can provide common answers like contraindications, pathology, etc. A bad answer is not helpful; it's useless. We want to do better, and you did much better by looking at transformer models. Tell us about the switch to transformers and the particular model you started to work with. It was a good time when I arrived at Synapse Medicine. We had this model, and I had to work on the first subjects. We saw an opportunity to try transformers, which everyone was talking about and should be really useful. We were curious to see if the results could be improved. This is where I share my love for Hugging Face because we can get all the models and APIs quite easily. We started to make a proof of concept and began with the Camembert model. If you're not French, you won't understand the joke, but Camembert is the most famous French cheese. It's also a French version of the famous BERT model. We tried Camembert on our classification problem, knowing our problem was quite specific with medical and drug vocabulary. The basic Camembert was already an improvement, but not much when doing cross-validation. On the final test set, the results were better because we had more data. The third column of results shows the results for all the questions that regex cannot address, and here we see a huge difference. The lesson is that it's worth trying. Start with simple things first, then try a pre-trained model, and see what works better before fine-tuning, which is what we did with Medi-Camembert. We started with the pre-trained Camembert base model and continued training the MLM task on medical data, such as medical Wikipedia and the RCP (professional instructions). This way, the model learned the relations and patterns in medical sentences. We then fine-tuned it on the intent classification task and got the results you can see. When we prepared the webinar, we looked at these numbers, and some people might think, "We only jumped from 0.895 to 0.912, which doesn't look impressive." But it is a huge difference. We were really happy with this model, and it's the one in production right now. On Santé.fr, part of the question is answered by regex, and when regex cannot find the answer, it switches to the Medi-Camembert model. The final score provides much more interesting results. Even during domain-specific pre-training, we saw sentences where the MLM task showed the model's improvement. For example, if you hide a word in "the iron something was," Camembert would say "iron mining," while the medical BERT would say "iron concentration." This showed it could be interesting, and the results are here. We have more models, including a distilled version. The last line in the table looks very nice. Why are you distilling? What do you expect from it? We tried Camembert directly and then thought maybe we don't need such a big model. We tried a distilled version, which is not yet in production. The results are even a little bit better for cross-validation and the final test set, but for the difficult questions, the big model is still better. It's a trade-off. The distilled Camembert takes 30% less time to train and is half the size. We should consider it. The next step would be to take all the answers that were missed and see if they are really complicated questions. This shows the iterative nature of machine learning. Start with simple algos, try something more complex, and refine it. This project is a great illustration of that. Start small, iterate, learn what works, and what doesn't. Hugging Face models and libraries are a great way to do that. We have a little more information on the intent. This slide shows where we are good and where we are not. The last distilled Camembert, pre-trained on medical data, is good on the diagonal and has some bad answers. For the patient model, some questions won't be asked a lot, like nephrology questions. Some mistakes can be understood, like confusing "posology for a child" with "only positive." We want to target these issues. Mistakes between pharmacokinetics and pharmacodynamics are less of a problem for the general public. Before we look at questions, what's next? Obviously, there's work on the models, building smaller or more accurate models, faster models, etc. What else are you working on? We want to study our mistakes and maybe increase the training dataset by adding general public instructions. We want to add vocabulary closer to what the general public uses. We are also looking at multi-label models and using the Medi-Camembert model for other projects. For example, when you show the use of Santé.fr, the short answer model goes through all the instructions and takes only the relevant sentences. We could try using the medical model for this. We have many projects that could use this, and we'll have more webinars. We have a few more minutes to answer questions. How many documents did you need for domain adaptation? I don't have the exact number, but it's a few thousand instruction lists and all the medical Wikipedia. It's quite a lot. Did you need to change the hyperparameters when fine-tuning? We worked on the hyperparameters for the domain-adapted BERT for performance issues, but not much for the training parameters. Have you considered stacking more than one transformer model? Ensembling in general? Ensembling is usually a good answer for classification models, and we should add it to the list for the next project. Did you have any concerns about class imbalance? We tried data augmentation to increase support for classes with not enough data. It improved a little but not as much as we wanted. It was a trade-off, and we were not so happy with the results. How do you interpret the better result for the distilled model compared to the Bayes model in cross-validation? This model is a recent improvement. We want to analyze these results. A smaller model can generalize better but might be less good for difficult questions. It's a trade-off between generalization and fine-grained answers. How do you deploy production models? What type of technologies do you use? The models are stored on a Minio service. We are using MLflow for training. For deployment, we have a test cluster, a QA team, and a medical team testing it, and then a production cluster dedicated to Santé.fr, where the data are stored in a specific place. We have more questions, but we need to stop now. Cynthia, should I encourage everyone to connect with you on LinkedIn? Yes, on LinkedIn. Give me a message. Be polite and introduce yourself. I hope you won't be 100 to do that, but please connect, and I'm sure Cynthia can answer more questions directly. Cynthia, thank you so much for your time. It was super interesting. I also want to thank Sébastien and Julien for helping me prepare for this webinar. It's a team effort. Thank you very much, everyone. Have a great day. I hope you learned a lot. Check out Synapse Medicine, connect with Cynthia, ask more questions, and of course, if you speak French, try the bot on santé.fr. Keep following Hugging Face for more news. We have a big launch coming next week, and information is on the website. Enough talk. Have a good day, everyone. See you later. Bye-bye. Thank you.

Tags

Healthcare InformationNatural Language ProcessingMachine Learning in MedicineTransformer ModelsPublic Health Technology