<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Elasticsearch backup strategies</title>
	<atom:link href="http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/</link>
	<description>direct from the mysterious land of the sysadmin</description>
	<lastBuildDate>Mon, 29 Apr 2013 13:01:29 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>By: dan</title>
		<link>http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/comment-page-1/#comment-131812</link>
		<dc:creator>dan</dc:creator>
		<pubDate>Tue, 23 Apr 2013 09:35:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.dark.ca/?p=299#comment-131812</guid>
		<description><![CDATA[Having full replication across every node is costly from a CPU / memory perspective as well, since each shard in an index is a full Lucene instance.]]></description>
		<content:encoded><![CDATA[<p>Having full replication across every node is costly from a CPU / memory perspective as well, since each shard in an index is a full Lucene instance.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Sherman</title>
		<link>http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/comment-page-1/#comment-131654</link>
		<dc:creator>Mark Sherman</dc:creator>
		<pubDate>Mon, 22 Apr 2013 15:40:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.dark.ca/?p=299#comment-131654</guid>
		<description><![CDATA[Okay. I&#039;m learning. (Slowly). In order to ensure full replication on every node of a multi-node cluster you would need to set replicas to nodes -1. This is wasteful from a storage perspective, but it means you can shutdown any node at any time, backup the data directory and know you have a coherent backup. The ES-bases approach is much more efficient from a storage usage standpoint, but it costs in time and cpu during the re-indexing process. Correct?]]></description>
		<content:encoded><![CDATA[<p>Okay. I&#8217;m learning. (Slowly). In order to ensure full replication on every node of a multi-node cluster you would need to set replicas to nodes -1. This is wasteful from a storage perspective, but it means you can shutdown any node at any time, backup the data directory and know you have a coherent backup. The ES-bases approach is much more efficient from a storage usage standpoint, but it costs in time and cpu during the re-indexing process. Correct?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dan</title>
		<link>http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/comment-page-1/#comment-131641</link>
		<dc:creator>dan</dc:creator>
		<pubDate>Mon, 22 Apr 2013 13:53:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.dark.ca/?p=299#comment-131641</guid>
		<description><![CDATA[If you are running a simple two-node cluster where each index is fully replicated to both nodes, then your approach would work.  If, however, you are running a larger cluster, then you likely do not have full replication to every node (or even any one single node, given a large enough data set).

Please re-read the blog post - especially the section on &quot;file-based approach&quot; - as it addresses this question exactly. :)]]></description>
		<content:encoded><![CDATA[<p>If you are running a simple two-node cluster where each index is fully replicated to both nodes, then your approach would work.  If, however, you are running a larger cluster, then you likely do not have full replication to every node (or even any one single node, given a large enough data set).</p>
<p>Please re-read the blog post &#8211; especially the section on &#8220;file-based approach&#8221; &#8211; as it addresses this question exactly. :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Sherman</title>
		<link>http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/comment-page-1/#comment-131637</link>
		<dc:creator>Mark Sherman</dc:creator>
		<pubDate>Mon, 22 Apr 2013 13:39:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.dark.ca/?p=299#comment-131637</guid>
		<description><![CDATA[Thanks for your reply. I do see now the scroll search in the API section of the guide on elasticsearch.org although I do not see the reindex API anywhere. Regardless of that, why would you need to run a different cluster in order to perform a hot backup? If I have two nodes in the same cluster running on different machines and I shutdown one node and backup the data directory doesn&#039;t that provide a valid backup? Sorry if this is a stupid question I&#039;m a real novice. Thanks again.]]></description>
		<content:encoded><![CDATA[<p>Thanks for your reply. I do see now the scroll search in the API section of the guide on elasticsearch.org although I do not see the reindex API anywhere. Regardless of that, why would you need to run a different cluster in order to perform a hot backup? If I have two nodes in the same cluster running on different machines and I shutdown one node and backup the data directory doesn&#8217;t that provide a valid backup? Sorry if this is a stupid question I&#8217;m a real novice. Thanks again.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dan</title>
		<link>http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/comment-page-1/#comment-131461</link>
		<dc:creator>dan</dc:creator>
		<pubDate>Sun, 21 Apr 2013 19:49:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.dark.ca/?p=299#comment-131461</guid>
		<description><![CDATA[Actually, that script &lt;em&gt;does&lt;/em&gt; use the API.  There are two API calls : one to set up a scroll search (i.e. gather the data), and another to &quot;reindex&quot; (i.e. import) said data into the new index.]]></description>
		<content:encoded><![CDATA[<p>Actually, that script <em>does</em> use the API.  There are two API calls : one to set up a scroll search (i.e. gather the data), and another to &#8220;reindex&#8221; (i.e. import) said data into the new index.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Sherman</title>
		<link>http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/comment-page-1/#comment-131432</link>
		<dc:creator>Mark Sherman</dc:creator>
		<pubDate>Sun, 21 Apr 2013 16:14:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.dark.ca/?p=299#comment-131432</guid>
		<description><![CDATA[I am trying to understand how to implement ES-based approach described above. Basically I am trying to understand the simple import script which contains a lot of commands that I am not familiar with. Is there a way to perform an import using the ES API?]]></description>
		<content:encoded><![CDATA[<p>I am trying to understand how to implement ES-based approach described above. Basically I am trying to understand the simple import script which contains a lot of commands that I am not familiar with. Is there a way to perform an import using the ES API?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lukas</title>
		<link>http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/comment-page-1/#comment-128434</link>
		<dc:creator>Lukas</dc:creator>
		<pubDate>Sun, 07 Apr 2013 17:09:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.dark.ca/?p=299#comment-128434</guid>
		<description><![CDATA[Ah ok, I get it now. The perl code is was a bit too concise for me to really get how this works on the ElasticSearch level. Thanks for the clarification.]]></description>
		<content:encoded><![CDATA[<p>Ah ok, I get it now. The perl code is was a bit too concise for me to really get how this works on the ElasticSearch level. Thanks for the clarification.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dan</title>
		<link>http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/comment-page-1/#comment-128409</link>
		<dc:creator>dan</dc:creator>
		<pubDate>Sun, 07 Apr 2013 14:41:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.dark.ca/?p=299#comment-128409</guid>
		<description><![CDATA[As I noted in the blog post, you&#039;re functionally bringing up a second, independent ES &quot;cluster&quot; (even if it&#039;s only one node), then importing an entire index to said cluster.  Your question doesn&#039;t really apply to the situation.]]></description>
		<content:encoded><![CDATA[<p>As I noted in the blog post, you&#8217;re functionally bringing up a second, independent ES &#8220;cluster&#8221; (even if it&#8217;s only one node), then importing an entire index to said cluster.  Your question doesn&#8217;t really apply to the situation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lukas</title>
		<link>http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/comment-page-1/#comment-128406</link>
		<dc:creator>Lukas</dc:creator>
		<pubDate>Sun, 07 Apr 2013 14:29:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.dark.ca/?p=299#comment-128406</guid>
		<description><![CDATA[How can you force one node to receive a copy of all shards? This isnt clear to me how your example script ensures this.]]></description>
		<content:encoded><![CDATA[<p>How can you force one node to receive a copy of all shards? This isnt clear to me how your example script ensures this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dan</title>
		<link>http://www.dark.ca/2011/11/22/elasticsearch-backup-strategies/comment-page-1/#comment-15644</link>
		<dc:creator>dan</dc:creator>
		<pubDate>Thu, 24 Nov 2011 22:00:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.dark.ca/?p=299#comment-15644</guid>
		<description><![CDATA[Yes, it assumes that the backup node is capable of managing the entire contents of the cluster, that is true.  The link you posted is functionally a filesystem-based approach - it is, after all, simply rsync&#039;ing the contents of the data folder.  It&#039;s not a bad approach, but the considerations regarding quorum and consistency still apply.]]></description>
		<content:encoded><![CDATA[<p>Yes, it assumes that the backup node is capable of managing the entire contents of the cluster, that is true.  The link you posted is functionally a filesystem-based approach &#8211; it is, after all, simply rsync&#8217;ing the contents of the data folder.  It&#8217;s not a bad approach, but the considerations regarding quorum and consistency still apply.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
