multithreading - Caching read-mostly values: are time lookups cheaper than atomic operations? -
my multithreaded app uses bunch of read-mostly values. these values configuration values, , change when operator edits config file , instructs app reload config file without downtime. values accessed multiple threads. none of threads mutate value. mutation occurs when config file reloaded.
because values can change, accessing them requires form of synchronization. because change rarely, not want use mutexes:
- a normal mutex disallows multiple threads accessing values concurrently. since threads read values, concurrent access threads safe long config file isn't being reloaded.
- a read-write mutex sounds solution, have high constant overhead.
i go lower level , use atomic operations directly. example, can make config object immutable, atomic pointer latest version:
- upon reloading config file, create new config object , atomically update pointer currently-active config object.
- the reader threads atomically load pointer , use config object without synchronization, since immutable.
however, atomic operations not free either. have hard time finding information on sort of overhead impose (cpu pipeline stalls? sort communication overhead between cpu cores? limit concurrency? not sure.) feeling it's better avoid them when possible.
so got idea of caching config pointer limited amount of time, e.g. 1 second. cached pointer accessed without synchronization. assumes time lookups less expensive , have less impact on concurrency atomic pointer operations. true?
so main question is:
- are time lookups cheaper atomic pointer operations?
- does overhead depend on granularity? example, seconds-granularity time lookups cheaper nanosecond-granularity?
- i interested in linux, information other platforms welcome too.
my secondary questions (for better understanding problem) are:
- what overhead of time lookup on various operating systems? happens during time lookup? kernel need called (a system call)?
- what overhead of atomic pointer load , store? happens in cpu?
additional information environment , use case:
- i using go app, interested in general, language-independent information. have c++ app heavily concurrent, appreciate answers language-independent possible.
- my primary production platform x86_64 linux, interested in information platform. since app used diverse range of users, have @ least aware of caveats other platforms, if don't particularly optimize them.
i'll give partial answer:
all low level languages have way cheaply , atomically load value memory. not require interlocked access on x86. in fact, on x86 regular load instruction. need prevent compiler , runtime (if any) reordering memory access. it's compiler barrier.
in c# facility surfaced through volatile
class. in c++ there atomic
. i'm not sure consistency level required. it's acquire load (which cheap on x86). msvc applies these semantics volatile
variables. not know other compilers. c standard not specify these semantics volatile compilers do.
this facility cheap think can stop looking else. operation should hit shared cpu cache line (except right after store has been made).
see herb sutters video series "atomic weapons" more details. frankly, he's more reliable source am.
Comments
Post a Comment