Introducing Developer mode

You asked, we delivered - Suverenum now has Developer mode for those who want full control. Here’s what you can do:

  • Pick any model you want with Advanced mode picker - not just our recommendations

  • Tune model parameters to adjust how the model behaves

  • Run it your way - GPU or CPU, your call

  • Monitor key metrics like tokens per second, time to first token and more

  • Get full visibility into your context window

Whether you’re experimenting, optimising or just curious about what’s happening under the hood - this one’s for you.

:play_button: Get the latest version

Let us know what you think. Your feedback shapes what we build next!

Thank you! Question - I have a custom prompt but the LLM isnt seeing it. Ive tried reloading and its showing up in the prompt box, but still not showing up to the LLM. IS there a way I need to apply it?

@Escher i guess you mean system prompt, right? From my experience, the issue is model size. Smaller models tend to skip system prompt. Do you mean something different? But I will double check on my end as well! Thank you!

Yes - the system under settings for dev mode - prompt… screenshot attached (chatting with QWEN 15b and applying it via RAG instead of the prompt under profile..)

1 Like

Thank you so much! Will check on the weekend and fix asap! Do we miss anything else for good developer mode?

It looks great actually! The only 2 things that I noticed are 1) the thinking bubbles are always on - might be nice to have a suppress thinking toggle (either just suppress the bubble, or toggle thinking mode on/off). I know this one can be tricky as not all the models encode the thinking the same way (I’ve been working on a front end of my own for VRM avatar chats). and 2) a “force model reload” Button to apply updated settings after changing context window token size, prompt or other params.

1 Like

Thank you for kind words!

  1. the thinking bubbles are always on - might be nice to have a suppress thinking toggle (either just suppress the bubble, or toggle thinking mode on/off).

as a temporary solution, you can set thinking budget to 0 in params, or for qwen write no_think in the prompt. will add it!

  1. a “force model reload” Button to apply updated settings after changing context window token size, prompt or other params.

will add as well. if you set in parameters “not persist in memory” (that’s by default), there is no need for it.

Thank you! by the way - I forget, do you have a patreon or anything? I’d love to send you a few dollars for your work, I know this isnt easy!

@Escher Rolled out an update about an hour ago:

  • Fixed the system prompt - my bad, you were right. It wasn’t working.

  • Added a thinking on/off toggle.

As for model reloading: we try not to keep anything in memory for too long, since it can degrade the experience for everyday users (e.g., video calls lagging). Also, the context window is typically tight on consumer hardware, so we recalculate it dynamically on each call (feeding in the most recent content - embeddings coming soon). This won’t have a huge impact for now, but we’re looking into a more robust strategy.

I forget, do you have a Patreon or anything? I’d love to send you a few dollars for your work—I know this isn’t easy!

Thank you so much for the support! We’d really appreciate a review we could feature on the landing page. As for donations, I’m a fan of WWF - feel free to donate there if you’d like! :panda: