Archive for the ‘ruby on rails’ Category
Ruby process & ActiveRecord data set executing in multi cores
You know what! in one of our (tekSymmetry LLC) projects, we have so many background calculations,
which usually takes so many hours to get fully completed. ever since we have introduced those processes,
we were having problem with it’s execution time. sometimes it get’s in nerve
as you know a single ruby process can use a single processor’s core at a time.
this is probable one of the reasons why muli processes based deployment
strategy is picked by ruby on rails community.
anyway, these days our servers got more than one core! more precisely,
in our case each of our production server got 8 cores based intel xeon processor.
so you see the question rose if we could run those long running expensive process in multicores
our system could have better chance to get faster!.
well this blog post is intended for showing you the technique how we have done it in ruby on rails.
for better understanding, let me give you some hints so you can get the context -
- we have big database table rows!
- processing a single row doesn’t require anything from the same database table.
- we are using linux (in our case debian lenny)
so here is the way we have done it -
- we took the max rows count for the main query
- and divided by the number of cores we have
- then we forked child process with each subset of the rows
- and executed the logic and related stuffs!
- on the parent process we initiated a loop where it was checking the newly forked process status
- if all the pid files (which are generated by the newly forked children) are removed,
parent process will flag it as successful execution thus it will end the loop.
so you see, it is damn! simple
_) and it is working for us
_),
it has improved our execution time 8x faster, because of getting 8 cores in new server.
here is the code in ruby how we did it. (we created a helper “multicore_execution_helper.rb“ and included in model, thus execute_in_multicores became usable)
1 module MulticoreExecutionHelper 2 3 def execute_in_multicores( 4 p_cores, p_total_rows, p_model, p_conditions = {}, &block) 5 6 p_cores == 2 if p_cores.to_i == 0 7 total_items_per_core = p_total_rows / p_cores 8 logger.info "[BATCH-PROCESS-LOG] Total processes - #{p_cores}, " + 9 "total rows - #{p_total_rows} [#{total_items_per_core} / 1 core]" 10 11 # Create job id for each process 12 job_ids = p_cores.times.collect{|i| rand.to_s } 13 14 # Fork process for each core and execute the block 15 p_cores.times do |offset| 16 Process.fork do 17 logger.info "[BATCH-PROCESS-LOG] Starting thread - #{offset} " + 18 "assigned # #{job_ids[offset]}" 19 20 # Keep job track through the created process pid file. 21 pid_file = File.join(RAILS_ROOT, 'tmp/pids/', "#{job_ids[offset]}.pid") 22 File.open(pid_file, 'w') {|f| f.puts Process.pid.to_s} 23 24 # Since fork process is created from the sample of the parent 25 # process's memory so we need to reconnect all live connections. 26 begin 27 ActiveRecord::Base.connection.reconnect! 28 29 # Retrieve data from the specific row through the defined 30 # offset and limit 31 teams = p_model.find( 32 :all, { 33ffset => (offset * total_items_per_core), 34 :limit => total_items_per_core}.merge(p_conditions)) 35 36 block.call(teams) 37 rescue => $e 38 logger.error "[BATCH-PROCESS-LOG] Exception raised during " + 39 "execution - #{$e.inspect}" 40 end 41 42 # Remove pid since we are done here! 43 FileUtils.rm(pid_file) 44 end 45 end 46 47 # monitor whether the process is completed or still in progress 48 # don't return this method unless all the forked processes have 49 # completed their job 50 sleep(2) 51 52 while 1 do 53 fully_completed = true 54 for job_id in job_ids 55 pid_file = File.join(RAILS_ROOT, 'tmp/pids/', "#{job_id}.pid") 56 if fully_completed && File.exists?(pid_file) 57 fully_completed = false 58 break 59 end 60 end 61 62 break if fully_completed 63 sleep(2) 64 logger.debug '[BATCH-PROCESS-LOG] again...' 65 end 66 end 67 68 end 69
here is the usages code -
143 execute_in_multicores(p_total_cores, SomeStuff.count, SomeStuff) do |some_stuffs| 144 # Do.. whatever you wanna do with the stuff here! these are gonna run on multicores! 151 end see it is really simple!_) if you like it let me know! how much you like it
_) here you can find the code on github best wishes!
Ruby on Rails demo application presentation is picked by slideshare’s editor
debugging rails internal query execution
while we were working with somewhere in… ads project we came up with some debugging and performance mesuring tool, here in my post i will describe how you can use it for yourself.
query debugging –

query debugging tool logs every executed query from active record and keep them in memory and using assisting template code it display all executed query from the active page.
also it executes query with mysql “explain” keyword. so on the same window you can see mysql query execution plan.
it helped us to track down queries which were not hitting the right index.
this is very simple trick – go through the code below -
module DebugUtil
class QueryDebug
@@QUERIES = {}
def self.add(p_query, p_report)
@@QUERIES[p_query] = p_report
enddef self.queries
q = @@QUERIES
clean
return q
enddef self.clean
@@QUERIES = {}
end
end
endQueryDebug class keeps all executed query and their explained resultset in to the static array. so later in template QueryDebug::queries is invoked to get all executed query for the current page.
here is how we trap the query execution from active record -
if defined?(QUERY_DEBUG_ENABLED) && QUERY_DEBUG_ENABLED
ActiveRecord::ConnectionAdapters::MysqlAdapter.class_eval do
alias __existing_execute_method executedef execute(sql, name = nil)
if sql.match(/^SELECT/i)
report = []
@connection.query(“explain #{sql}”).each do |row|
report < < row
end
DebugUtil::QueryDebug.add(sql, report)
end
__existing_execute_method(sql, name)
end
endObject.class_eval do
def raise_during_query_debug
raise DebugUtil::QueryDebug::queries.inspect
end
end
endyou can see we have used “QUERY_DEBUG_ENABLED” constant to ensure whether this is enabled by intention.
now see how we are rendering on our template.
query debug
checked < %= row.join(“ “) % >
we put this code in common layout. so it renders on every page. thats all
time based cache expiry for rails action cache
rails has excellent support for caching action, page, query and so on.
rails default behavior is more than expected for most of the project. though i was looking for some time based expiry function on “caches_action” functionality. unfortunately there wasn’t anything so here is a simple trick i have used to make it work with different url and time based expiration.
i added “caches_action :recent” on my controller and added the following protected method -
protected
def fragment_cache_key(p_args)
cache_key = “cache_key_#{request.path}#{request.headers["QUERY_STRING"]}”.gsub(/=/, “”)
action_cache_key = get_from_cache(cache_key)
if action_cache_key
return action_cache_key
else
action_cache_key = Digest::MD5::hexdigest(“#{rand}#{Time.now}”)
add_to_cache(cache_key, action_cache_key, {:expiry => 1.hours})
return action_cache_key
endend
actually i generate key and stored them inside my memcached instance with an hour expiry limit.
so when memcache invalidates my cache my action cache is also get invalidated.
so thus rails default action cache work with time limit
don’t think this is all, i suppose to cleanup the previously created cache file so i won’t get unnecessary store consumption .
nginx on debian box
i had a tough time to configure nginx on my debian production environment.
the recent stable release from nginx is 0.6.x but on debian repository it was 0.4.x, so i had to build it from the source and install it.
since i had an old 0.4.x instance of nginx, installation wasn’t as smooth as i was expecting. here i would try to show how i have resolved those broken issues and made my way to run nginx to reverse proxy my backend mongrel instances.
i took several attemtps to remove the existing 0.4.x instance of nginx but i failed.
i used “aptitude remove nginx” i ended with the following error -
Reading package lists… Done
Building dependency tree
Reading state information… Done
The following packages will be REMOVED:
nginx
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
Need to get 0B of archives.
After unpacking 582kB disk space will be freed.
Do you want to continue [Y/n]?
(Reading database … 78227 files and directories currently installed.)
Removing nginx …
Stopping nginx: nginx.
Stopping nginx: invoke-rc.d: initscript nginx, action “stop” failed.
dpkg: error processing nginx (–remove):
subprocess pre-removal script returned error exit status 1
Starting nginx: nginx.
Errors were encountered while processing:
nginx
E: Sub-process /usr/bin/dpkg returned an error code (1)
though this is not my real error code but it has similarity, i took it from the following url -
http://sudhanshuraheja.com/2007/09/remove-nginx-from-ubuntu-fiesty-fawn.html
this blog author had some suggestion, but that wasn’t working for me, so i tried in different way -
i executed “sudo apt-get build-dep nginx” i found this tips from one of the blog comments
the comment author explained in this way -
“this should install everything required to build the package (compiler, headers/libs, packaging tools). Usually on a fresh install I do this to get everything required to build zope.Then issue “apt-get source nginx” (you need deb-src sources in /etc/apt/sources.list). This will download nginx sources (original tarball, diff, and uncompressed sources with patches applied). Just cd in source dir, make your modifications and use “dpkg-buildpackage -rfakeroot -b” (this requires fakeroot package). In parent directory you should get new deb files ready to install, with start/stops scripts and your patches. Just take care of package update that will surely remove your nginx version.”
if you want service script to initiate nginx on startup follow the link -
http://blog.labratz.net/articles/2006/10/03/rails-deployment-apache-lighttpd-nginx-mongrel-cluster
best wishes,
upcoming project mojar_workflow, workflow engine in ruby
hi,
we just kicking start a new open source ruby based workflow engine project “mojar workflow“.
we named it after our deshi word “mojar” reason is very clear to
spread out this word.
mojar workflow, is integral solution to execute a flow of business
rules. for example -
you have an action where you have the following set of rules -
1. start transaction
2. verify user account
3. verify user balance
4. verify user dues
5. reduce dues from balance
6. complete transaction
after few days you got a new requirement, where you suppose to reduce
user dues by the 10% because of company new discount policy.
so you have to implement the following rules -
1. start transaction
2. verify user account
3. verify user balance
4. verify user dues
5. reduce dues by 10% of discount
5. reduce discounted dues from balance
6. complete transaction
to implement such scenario you have to again code in your stable
release. but using mojar workflow, you can add that new concern from
the abstract flow maintenance layer. where you can define this flow in
yaml file or an xml document.
keep your eyes on -
http://rubyforge.org/projects/mojarworkflow/
best wishes,
rails plugin symlinked broken on 1.2.5, fixed from 2.0
i was trying to build a rails plugin. my project was in different directory so i symlinked the directory under “vendor/plugins/..”. but i couldn’t find it working.
so after passing few times, i could successfully run my plugin under rails 2.0-RC2. so later i compared lookup.rb file from the 1.5 and 2.0-RC2 release.
the defecting code was the following lines – (1.5)
def use_component_sources!
# ….
sources < < PathSource.new(:lib, “#{::RAILS_ROOT}/lib/generators”)
sources << PathSource.new(:vendor, “#{::RAILS_ROOT}/vendor/generators”)
sources << PathSource.new(:plugins, “#{::RAILS_ROOT}/vendor/plugins/**/generators”)
# ….
end
the fixed version – (2.0-RC2)
def use_component_sources!
# …sources < < PathSource.new(:lib, “#{::RAILS_ROOT}/lib/generators”)
sources << PathSource.new(:vendor, “#{::RAILS_ROOT}/vendor/generators”)
sources << PathSource.new(:plugins, “#{::RAILS_ROOT}/vendor/plugins/*/**/generators”)
sources << PathSource.new(:plugins, “#{::RAILS_ROOT}/vendor/plugins/*/**/rails_generators”)
end
# …
end
i also checked out rails bug tracker i found a bug was pointed to this issue and apparently which was fixed on the following change set.
http://dev.rubyonrails.org/changeset/6101
simple fragment cache implementation on ruby on rails
i was getting serious performance problem with one of my projects. so i came up with a simple fragment cache implementation on ruby on rails.
after implementing this stuff, i replaced “render(:partial => …)” with the following method -
render_from_cache_or_render(:cache_key =>”cache key”, :cache_expire_after => ConstantHelper::TAG_CLOUD_EXPIRED_IN, # minutes :partial => “….”)
let’s have a look on my implementation -
def render_from_cache_or_render(p_args)
return render(p_args) if true == p_args[:cache_off]
# check from cache
cache_key = p_args[:cache_key]
cached_content = CacheService.get_cache(cache_key)if not cached_content.nil? and not cached_content.empty?
return cached_content
else
content = render(p_args)
# cache expire time if defined
cache_expire_time_in_minutes = p_args[:cache_expire_after] || 60
CacheService.add_cache(cache_key, cache_expire_time_in_minutes, content)
return content
end
end
actually, my implemented “CacheService” class is simply storing all cache in a hash map.
when some cache was requested for peek, cache expiry was checked before returning the cached value.
for CacheService implementation look at the bottom of my post.
anyway, after implementing and utilizing this stuff, i gained 70+ requests capability per second. fyi, before applying cache it was around 10 per second.
module Cache
class Item
attr_accessor :key, :expire_time, :content, :created_ondef initialize(p_key, p_expire_time, p_content)
@key = p_key
@expire_time = p_expire_time * 60 # in minutes
@content = p_content
@created_on = Time.now
end
end
endclass CacheService
@@CACHES = {}
@@CACHE_EXPIRE_TIMES = {}def self.add_cache(p_key, p_expire_time, p_content)
cache_item = Cache::Item.new(p_key, p_expire_time, p_content)
@@CACHES[p_key.to_sym] = cache_item
enddef self.get_cache(p_key)
# load content from cache
cached_content = @@CACHES[p_key.to_sym]
return nil if cached_content.nil?# verify cache validity
return cached_content.content if not expired?(cached_content)
return nil
endprivate
def self.expired?(p_cache)
# find time difference
time_difference = Time.now – p_cache.created_on
return true if time_difference > p_cache.expire_time
end
end
best wishes,
thats why i like ruby!!! thanks dynamic scripting…
if you have rails deployment on windows environment with mongrel service, i think you might face the following problem -
Errno::EINVAL (Invalid argument):
/app/models/index_service.rb:63:in `write’
/app/models/index_service.rb:63:in `puts’
this problem was because of “puts” what i forgot to remove before deploying on test server.
if your deployment on windows service environment and if your code has few “puts” usages, you must face this problem with mongrel
on mongrel group, i found they are working with this, hopefully they will replace puts with logger and other things.
anyway, the quickest solution i had in mind was just use the dynamic behavior of ruby. here is what i did -
def puts(p_args)
logger.debug(p_args)
end
thats all fixed my problem
thank ruby, thanks for dynamic scripting…
Fat Refactoring: use include module to reduce number of lines
if i didn’t mention that before, i should tell it now, here at somewhere in… rnd team we are playing a lot with ruby on rails. these days our rails team is completely focusing on a product(which is secret for the time being
) where we
found a lot of interesting stuffs, for instance.
few days back, we found our application_helper and few controllers are growing too fast and getting extra fat (lines of code). so we had few refactoring to reduce the extra fat.
now have a look on the code we had with in application_helper.rb taken from tag/v-0.3

this code is not completely visible over the screen snap, this is 340 number of lines. which was the output of our 3 iterations.
though these number of lines are not that much problematic, but we had a scenario which was difficult to make it more concern aware and single concerned.
now have a look on our code which is taken from the current trunk,

Wow, now it is 50 lines only including the header copyright information.
the trick was very simple, we followed the following conventions -
1. find out all related and same concerned functions
2. stick team together in a module
3. include the module to statically import all functions
no integration error, nothing has occurred.
we are happy with this
i think, our ruby learning process is going smooth





