Enterprise AI with the Hugging Face Enterprise Hub

April 02, 2024
Curious how the Hugging Face Enterprise hub, an enterprise-ready version of the world’s leading AI platform, can help your organization foster collaboration and innovation while maintaining a strong security and compliance posture? In this video, my colleague Derek and I walk you through the features of the Hugging Face Enterprise Hub. Learn more at https://huggingface.co/enterprise ⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at https://julsimon.medium.com or Substack at https://julsimon.substack.com. ⭐️⭐️⭐️ 00:00 Introduction 01:45 Model pages 05:45 Model repositories 08:20 The Community tab 10:10 The "Train" and "Deploy" buttons 12:35 Organizations 15:00 User roles 18:00 Single Sign On 20:10 Auditing 21:30 Data locality 23:20 Resource groups 24:55 Advanced security 27:15 SOC 2 and GDPR 28:35 CI/CD integration

Transcript

Hey, my name is Derek Thomas, and I'm a machine learning engineer at Hugging Face. I'm really excited to talk to you about our Enterprise Hub and some of the features you may not know about. We've been developing it significantly, especially in the last few weeks. I have with me Julien Simon. I'll let you introduce yourself. Sure. Thanks, Derek. My name is Julien, and I am the chief evangelist of Hugging Face. We're going to talk about the Hub and focus on the Enterprise Hub, which, as Derek mentioned, has been moving very quickly in the last few weeks. We'll give you a quick refresher on what the Hugging Face Hub is so that we can explain better how the Enterprise Hub is different. Okay, let's get started, Derek. So a lot of people know us as a collection of models, but we've started to expand into quite a number of features. You can get a good feel for what's going on in our trending section. We have spaces, datasets, and models. We can go to the model page and see a little bit more in detail what we have available. One of the most exciting things is how many models we have. Do you remember how many models were on the hub when you joined? No, I forgot. I can tell you, I did a meetup last night, and I think it was 570,000. That shows you it's thousands a day, but I might be wrong. It feels like it. Moving very fast. The really awesome thing about the Hub is that it's the default standard for when someone wants to share something. That's how we started, as the community had this need to evolve quickly and needed a place that's more machine learning focused. You can see to the left, we have various tasks you can sort these models by, and a number of other ways to do it as well. Notably, licenses are a popular question. How do I find models that are commercially friendly? Apache 2 and MIT at the top here are probably the ones you want to look at, though not the only ones. Filtering on libraries, like PyTorch, is also useful. Languages are important. Not everyone speaks English. If you're looking for models in German, French, Arabic, or anything else, that's how you find them. I keep scrolling and won't reach the end. One thing to note is that we recently uploaded GGUF. There's some cool compatibility there. We won't go into that in this video, but it shows how we're evolving to meet community needs. Let's dive into a model of choice. Let's pick the most recent exciting version from Databricks. Oh, yeah. The Databricks model. Can you tell me a little bit about safe tensors? Sure. Safe tensors is a file format for model files. In the last year or so, we've been converting all the models on the hub to this new format. Previously, they were mostly pickle objects. The problem with pickle objects is the remote chance that someone has injected malicious code, which could trigger potential code execution at model loading time. Safe tensors mitigates that. Security is very important, so I highly recommend using safe tensors and GGUF as we populate that. These are safer file formats. We'll talk more about security, but safe tensors is really important. I love that it's evolved from something convenient into something more meaningful. I think that's the trend we're showing with the public hub and the Enterprise Hub. You can see a lot of tags here. This one has a limited release, so you need to apply for it. We can see a lot of good information, like where it came from, how it was trained, its size, and whether it can be deployed. The research paper, archive research papers, and the license are all visible. These tags can also be used for filtering. We encourage model creators to provide as much information as possible for transparency, so you're not working with black box APIs. What's in there? Click on this. Transparency is the cornerstone of collaboration. I need to know what's going on and the history. You can find the literal model history in the commit, see what changed in the code, and go back in time. Each repo here is a Git repo. If you use the URL for that model page, you can clone directly from there. I recommend working with open-source libraries to load the models, but for automation and DevOps purposes, it's easier to work with the tools you already use. You can clone and feed the cloned repo to your DevOps pipeline. One question: Some viewers might not know about Git and large files. Do I need to merge something like a safe tensor file that's five gigs? You see the LFS icon here, which means large file support. That's a Git extension. On the hub, we split file hosting into small files stored in the Git repo itself and large files, which are pointers to artifacts stored in object storage. This manages very large files in Git without committing huge things and pushing everything all the time. If you work with Git, make sure you have Git LFS enabled on your machine to avoid unfriendly error messages. We've talked about collaborating from a code perspective, but what about the human perspective? The community tab. Exactly. 76, obviously this is a hot model. It's a mini forum for that model. You can ask questions and get in touch with model creators or users. You can also have pull requests, just like on GitHub. If you'd like to update something in the model repo, like the readme file, model weights, or configuration, you can do that. The collaboration flow should feel very natural, similar to what you do with code projects on GitHub or GitLab. It's really nice to have a model, but not many people want a model for its own sake. You typically want to use it somewhere. Here, you can see how to use it in SageMaker. You can choose your task, and it'll automatically fill out the code. We generate the Python code based on the SageMaker SDK. You simply copy and paste it into a notebook and run it to deploy the model on AWS. We have training and inference code, and support for AWS custom accelerators like Inferentia, with Tranium coming soon. We also have code snippets for deploying models on our own service, inference endpoints, Azure ML, and we recently announced a partnership with Google Cloud, so expect a Google Cloud deployment option soon. You can always work with open-source libraries in any environment. We want to save you time and make it simple to work with models, even if you're not an expert. Your starting point is just copying and pasting the code and hitting the ground running. We've seen how to find models, understand them, and deploy them in various ways. But what if you have a team and need to collaborate across multiple models? Collaboration goes beyond the community. It also means collaborating with colleagues inside your company. We have the concept of organizations. For example, the Databricks organization. An organization is a group of people who want to collaborate on models, datasets, spaces, and potentially need private access without the rest of the world seeing their work. We have thousands of organizations on the hub. For example, Intel shares their models and artifacts with the community by uploading them to their organization. They can collaborate both internally and externally, making models and datasets private or public as needed. Someone has to decide who joins the org and what they can do. There are permissions and roles. Derek, tell us about that. Let me open an organization I control. Demo Corp. It's easy to go to the settings and see members. One challenge, especially for popular orgs like Intel, is identifying who should join. They can automatically allow requests or remove this option to be more private. They can let people ask to join or add them explicitly. Admins keep everything running, and read and write permissions are straightforward. Contributors can write to repos they create, which is like a junior write permission. This is the basic org, and creating one is free. You can create it in seconds and start collaborating with your internal community, making things private or public. Some customers need more, like more security and compliance. That's where the Enterprise Hub comes in. You can see some features enabled here because this org is an Enterprise Hub organization. The Enterprise Hub is a commercial service. You can easily subscribe and start enjoying the extra features. Should we walk through each one? Sure. What are some features you would want in an organization? Control, governance, security, and compliance, especially depending on your location. The Enterprise Hub addresses increased security, compliance, and auditing. A common request is syncing users with the hub user directory. You don't want to proliferate user accounts on the hub because if someone leaves the company, that's another account to manually shut down, which could be a security risk. SSO allows you to plug your SSO into the Hub using OpenID or SAML, so users with permissions in your IT platform can apply the same permissions on the hub. This avoids duplicating user accounts and the risk of not shutting them down when employees leave. The next thing you want is to know what's happened historically. You want to be able to audit actions. You see all the actions on the org, like creating a private space or adding users. Auditing is critical for regulated organizations who need to demonstrate compliance. It's not enough to say you are compliant; you need to show that controls are in place and everything is documented. This might not look sexy, but it's one of the most important features for adopting the Enterprise Hub. Resource groups are another feature. Organizations often have many teams. The organizational permissions show broad access for the entire organization, but resource groups allow you to break down users into smaller groups with different permissions. For example, you might have a financial team, a data science team, and a software engineering team. You can add repositories and users to specific resource groups for more granular control. This avoids managing different orgs for different teams, making it easier to govern. The last feature is advanced security. This just came out. Many enterprise groups want a default way of setting repository visibility. Some organizations want their repos to be private by default, while others want them to be public by default. You can shape this based on your preference. I recommend private by default for better security. You don't want an intern to share something they shouldn't. You can switch the visibility of models or datasets to public on a model-by-model or dataset-by-dataset basis. Strong defaults are generally preferable for security. There are quite a few new features for enterprise users, from SSO to auditing, fine-grained control on users and permissions, and data locality with regions. All of this comes from direct customer feedback, unlocking the hub for regulated companies and users who couldn't use the public hub. The Hugging Face hub and its services are also SOC 2 compliant, and we can share the SOC 2 report under NDA. We are GDPR compliant and have a privacy policy. Feel free to look at that and get in touch if you have questions. The Enterprise Hub is a nice product. We've had a big sale come in yesterday, with a large organization wanting 500 users. We expect to see this more often. One last thing is that many machine learning teams have specific workflows. No matter how comprehensive a tool is, there's probably something new or uncovered. We have a strong API that allows you to build specific workflows for your organization, like integrating with CI/CD. You have a lot of power and control with that. You've seen the security features, community features, and strong configurability. That's what we wanted to cover today on the Enterprise Hub, Derek. Absolutely. Thank you so much. I really appreciate your time and expertise. That was a lot of fun. It was nice chatting with you, and maybe we'll do more. There are lots of good features that keep coming, and we'll certainly keep everyone updated. Thanks for listening. Feel free to ask questions in the comments, get in touch on LinkedIn, or visit the Hugging Face forums at discuss.huggingface.co. There are many ways to contact us and get help. Well, see you later, Derek. Thanks, everyone. Thanks for watching. Bye-bye.

Tags

MachineLearningHuggingFaceEnterpriseHubModelDeploymentSecurityCompliance