Instruction cache is used, typically, once per instruction. In CISC machines with unusually long instructions, the instruction cache may be used more, but the average instruction requires just a single cache access.
Data cache is used once per data access. In the RISC architecture, there will be zero or one data access per instruction. In a CISC machine, the number could be higher, much higher in some cases.
Address translation cache (or translation lookaside buffers [TLBs]) are used once before the instruction is fetched, and then again before each data access. In RISC-based systems, therefore, the address translation cache is used twice as often as either the instruction or data cache. And in CISC machines, although the ratio is lower, this oft-ignore cache is still a key player in overall system performance. In spite of this, and in both RISC and CISC, the address translation cache is usually the smallest of the three caches.
Some processors include Block Address Translation hardware, or BATs. These machine registers map large areas of memory and, when used appropriately, can significantly improve performance. Embedded systems, with their single-minded purposes, may find these particularly useful in user space, but not without some modification of the system's board support software and, possibly, in the Linux kernel itself.
So, who on your project is responsible for the board bring-up activity and are they aware of how critical this facet of the hardware is (which is managed by the software in most embedded environments)? RyteTyme knows and can show your engineers what to do.