Isn't that the other way around? I thought voice is usually the most complicated to implement, especially in 4G/5G which uses modified SIP/RTP. Mic and speaker has to work too. Web browsing OTOH at bare minimum require just simulated PPP over AT command interface, touchscreen, and Chromium.
If you're building your own cell modem, then, probably.
In terms of "finding hardware that can do the following things," pretty much any cell phone sold will support voice and text - though some don't do a great job with MMS. The web browser requires orders of magnitude more resources than voice/text, though. A Nokia from 20 years ago, with... I actually can't find how much RAM or storage, had no problems with voice or text. Wouldn't run a browser at all, though.
I was admittedly of the impression that most of the voice work was handled by the baseband, though. I've not built my own phones, unfortunately. I just use some basic flip phones for mobile use.
As recently as the Samsung Galaxy S7, Pinephone, etc., voice calls are done with high-level commands too. All you have to do is get the soundcard mixer settings right. Figuring out the correct settings is tricky but once you have them it's plain sailing.
Web browsers have a very broad interface with the underlying system.
Speaking from experience: getting calls and text working is much easier than getting a browser going in a new environment.
People, including a lot of very tenured web developers, radically underestimate how big browsers are in surface area now.
Any of WebGL, WebRTC, the MediaStreams, Web Audio or WebGPU by themselves are enormous, and on mobile SoCs will rely heavily on device specific drivers to accelerate their functions, or the battery runs out in 5 minutes.
Firefox depends on many many parts of the operating system. Calls and texts, not so much. Starting from scratch, a kernel and little else, getting calls and texts working is easy compared to getting Firefox ported.