serialization - How should C++ objects be serialized? -
we doing project on high performance computing, using mpi parallel computing framework. there few algorithms implemented on legacy platform. rewriten original serial algorithm parallel version based on mpi.
i encounter performance problem: when running parallel algorithm based on mpi, there lot of comunication overhead between multiple process. inter-process comunication consist of 3 steps:
- process serialize c++ objects binary format.
- process send binary format data process b mpi.
- process b deserialize binary format data c++ objects.
we found these comunication steps, serialize/deserialize steps, cost huge amount of time. how hand performance issue?
by way, in our c++ code, use lot of stl, more complex c-like struct.
p.s. doing this(serialization) written code traversing fields of objects , copy them sequentially byte array.
to demonstrate doing, there code snippet. note single feature construction process:
sic::geometryfeature *ptfeature = (geometryfeature *) outlayer->getfeature(ifeature); sic::geometry* geom = ptfeature->getgeometry(); std::string geomclassname = geom->getclassname(); sic::geometry* ptgeom = geom; unsigned char *wkbbuffer = null; ogrgeometry * gtgeom = null; if (geomclassname == "point") { ptgeom = new sic::multipoint(); ((sic::multipoint *) ptgeom)->insert(geom); gtgeom = new ogrmultipoint(); int wkbsize = ((sic::multipoint *) ptgeom)->wkbsize(); wkbbuffer = (unsigned char *) malloc(wkbsize); ((sic::geometrycollection *) ptgeom)->exporttowkb(sic::wkbndr, wkbbuffer, wkbmultipoint); } } else if (...) { ...... } gtgeom->importfromwkb(wkbbuffer); free(wkbbuffer); assert(gtgeom); ogrfeature * pofeature = ogrfeature::createfeature( polayer->getlayerdefn()); pofeature->setgeometry(gtgeom);
and more doing serializing objects:
unsigned char *bytes = (unsigned char *) malloc(size); size_t offset = 0; size_t type_size = sizeof(ogrwkbgeometrytype); ogrwkbgeometrytype type = layer->getgeomtype(); memcpy(bytes + offset, &type, type_size); offset += type_size; size_t count_size = sizeof(int); int count = layer->getfeaturecount(); memcpy(bytes + offset, &count, count_size); offset += count_size; layer->resetreading(); (ogrfeature *feature = layer->getnextfeature(); feature != null; feature = layer->getnextfeature()) { ogrgeometry *geometry = feature->getgeometryref(); if (geometry) { geometry->exporttowkb(wkbndr, bytes + offset); offset += geometry->wkbsize(); } else { (*(int *) (bytes + type_size))--; } ogrfeature::destroyfeature(feature); } return bytes;
any comment appreciated. thanks!
(brian's answer's offering use library... he's experienced programmer - sounds worth go.)
separately, looked @ code - there's lots of temporary buffers, new/malloc allocation, use of sizeof
etc.. thought i'd illustrate "quick, simple nice" approach cleaning - enough started...
first create binary stream type factors , hides lot of low-level work:
#include <arpa/inet.h> // htonl/s, ntoh/s #include <endian.h> // htonbe64, if have it... #include <iostream> #include <string> #include <map> // support routines - use c++ overloading polymorphically dispatch htonl/s // uint64_t hton(uint64_t n) { return htonbe64(n); } uint32_t hton(uint32_t n) { return htonl(n); } uint16_t hton(uint16_t n) { return htons(n); } // there no "int" versions - ugly effective... uint32_t hton(int32_t n) { return htonl(n); } uint16_t hton(int16_t n) { return htons(n); } // uint64_t ntoh(uint64_t n) { return betoh64(n); } uint32_t ntoh(uint32_t n) { return ntohl(n); } uint16_t ntoh(uint16_t n) { return ntohl(n); } template <typename ostream> class binary_ostream : public ostream { public: typedef binary_ostream this; this& write(const char* s, std::streamsize n) { ostream::write(s, n); return *this; } template <typename t> this& rawwrite(const t& t) { static_cast<ostream&>(*this) << '[' << sizeof t << ']'; return write((const char*)&t, sizeof t); } template <typename t> this& hton(t h) { t n = ::hton(h); return rawwrite(n); } // conversions inbuilt & standard-library types... friend this& operator<<(this& bs, bool x) { return bs << (x ? 't' : 'f'); } friend this& operator<<(this& bs, int8_t x) { return bs << x; } friend this& operator<<(this& bs, uint8_t x) { return bs << x; } friend this& operator<<(this& bs, int16_t x) { return bs.hton(x); } friend this& operator<<(this& bs, uint16_t x) { return bs.hton(x); } friend this& operator<<(this& bs, int32_t x) { return bs.hton(x); } friend this& operator<<(this& bs, uint32_t x) { return bs.hton(x); } friend this& operator<<(this& bs, double d) { return bs.rawwrite(d); } friend this& operator<<(this& bs, const std::string& x) { bs << x.size(); return bs.write(x.data(), x.size()); } template <typename k, typename v, typename a> friend this& operator<<(this& bs, const std::map<k, v, a>& m) { typedef typename std::map<k, v, a>::const_iterator it; bs << m.size(); (it = m.begin(); != m.end(); ++it) bs << it->first << it->second; return bs; } // add others want... };
creating user-defined binary-serialisable type...
// own objects... struct object { object(const std::string& s, double x) : s_(s), x_(x) { } std::string s_; double x_; // specify how want binary serialisation performed (which fields/order etc) template <typename t> friend binary_ostream<t>& operator<<(binary_ostream<t>& os, const object& o) { return os << o.s_ << o.x_; } };
example usage:
#include <iomanip> #include <sstream> // support routines observe/debug serialisation... std::string printable(char c) { std::ostringstream oss; if (isprint(c)) oss << c; else oss << "\\x" << std::hex << std::setw(2) << std::setfill('0') << (int)(uint8_t)c << std::dec; return oss.str(); } std::string printable(const std::string& s) { std::string result; (std::string::const_iterator = s.begin(); != s.end(); ++i) result += printable(*i); return result; } int main() { { binary_ostream<std::ostringstream> bs; object o("pi", 3.14); bs << o; std::cout << "serialised '" << printable(bs.str()) << "'\n"; } { binary_ostream<std::ostringstream> bs; std::map<int, std::string> m; m[0] = "zero"; m[1] = "one"; m[2] = "two"; bs << m; std::cout << "serialised '" << printable(bs.str()) << "'\n"; } }
the next step create binary_istream
- it's very, similar above. (boost
reduces work little using '%' operator instead of traditional <<
, >>
, such same function can specify fields serialiation , deserialisation.)
implementation notes/thoughts:
- if prefer, can remove template parameter binary_stream, , have constructor store arbitrary
std::ostream&
private
member variable, send streaming operations data member.- this has advantages of minimising code bloat instantiations differents stream types, allowing implementation hidden translation unit , linked in later (helps keep compilation times down in large project), , letting attach
binary_stream
existing stream @ time (great if someone's passing pre-existing stream). - the "disadvantage" have explicitly forward other
ostream
member functions want accessiblebinary_stream
users (more control tedious), or provide (less convenient/elegant?)std::ostream& stream() { return s_; }
-style accessor.
- this has advantages of minimising code bloat instantiations differents stream types, allowing implementation hidden translation unit , linked in later (helps keep compilation times down in large project), , letting attach
Comments
Post a Comment