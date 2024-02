It depends on your hardware and what you are looking to get out of it. There is quite a bit of discussion here: https://www.reddit.com/r/LocalLLaMA/ Coding assistants are popular, role playing chats, and there are now options to read local files as a backend dataset.The largest factor right now is video card RAM (bonus points if you have a newer Apple M2 device since it has unified video memory).The more GPU memory you have the faster the LLMs can process and the larger (e.g. smarter) the models you can use.I use this single executable program to run models locally: https://github.com/LostRuins/koboldcpp A typical model is anywhere between 8-40GB.Nvidia just dropped this but no idea how good it is: https://www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generative-ai/ It looks like it is based off Mistral 7B which is censored but I would be surprised if people get it to run other models too.