Archive for the ‘Ruby’ Category
Ruby process & ActiveRecord data set executing in multi cores
You know what! in one of our (tekSymmetry LLC) projects, we have so many background calculations,
which usually takes so many hours to get fully completed. ever since we have introduced those processes,
we were having problem with it’s execution time. sometimes it get’s in nerve
as you know a single ruby process can use a single processor’s core at a time.
this is probable one of the reasons why muli processes based deployment
strategy is picked by ruby on rails community.
anyway, these days our servers got more than one core! more precisely,
in our case each of our production server got 8 cores based intel xeon processor.
so you see the question rose if we could run those long running expensive process in multicores
our system could have better chance to get faster!.
well this blog post is intended for showing you the technique how we have done it in ruby on rails.
for better understanding, let me give you some hints so you can get the context -
- we have big database table rows!
- processing a single row doesn’t require anything from the same database table.
- we are using linux (in our case debian lenny)
so here is the way we have done it -
- we took the max rows count for the main query
- and divided by the number of cores we have
- then we forked child process with each subset of the rows
- and executed the logic and related stuffs!
- on the parent process we initiated a loop where it was checking the newly forked process status
- if all the pid files (which are generated by the newly forked children) are removed,
parent process will flag it as successful execution thus it will end the loop.
so you see, it is damn! simple
_) and it is working for us
_),
it has improved our execution time 8x faster, because of getting 8 cores in new server.
here is the code in ruby how we did it. (we created a helper “multicore_execution_helper.rb“ and included in model, thus execute_in_multicores became usable)
1 module MulticoreExecutionHelper 2 3 def execute_in_multicores( 4 p_cores, p_total_rows, p_model, p_conditions = {}, &block) 5 6 p_cores == 2 if p_cores.to_i == 0 7 total_items_per_core = p_total_rows / p_cores 8 logger.info "[BATCH-PROCESS-LOG] Total processes - #{p_cores}, " + 9 "total rows - #{p_total_rows} [#{total_items_per_core} / 1 core]" 10 11 # Create job id for each process 12 job_ids = p_cores.times.collect{|i| rand.to_s } 13 14 # Fork process for each core and execute the block 15 p_cores.times do |offset| 16 Process.fork do 17 logger.info "[BATCH-PROCESS-LOG] Starting thread - #{offset} " + 18 "assigned # #{job_ids[offset]}" 19 20 # Keep job track through the created process pid file. 21 pid_file = File.join(RAILS_ROOT, 'tmp/pids/', "#{job_ids[offset]}.pid") 22 File.open(pid_file, 'w') {|f| f.puts Process.pid.to_s} 23 24 # Since fork process is created from the sample of the parent 25 # process's memory so we need to reconnect all live connections. 26 begin 27 ActiveRecord::Base.connection.reconnect! 28 29 # Retrieve data from the specific row through the defined 30 # offset and limit 31 teams = p_model.find( 32 :all, { 33ffset => (offset * total_items_per_core), 34 :limit => total_items_per_core}.merge(p_conditions)) 35 36 block.call(teams) 37 rescue => $e 38 logger.error "[BATCH-PROCESS-LOG] Exception raised during " + 39 "execution - #{$e.inspect}" 40 end 41 42 # Remove pid since we are done here! 43 FileUtils.rm(pid_file) 44 end 45 end 46 47 # monitor whether the process is completed or still in progress 48 # don't return this method unless all the forked processes have 49 # completed their job 50 sleep(2) 51 52 while 1 do 53 fully_completed = true 54 for job_id in job_ids 55 pid_file = File.join(RAILS_ROOT, 'tmp/pids/', "#{job_id}.pid") 56 if fully_completed && File.exists?(pid_file) 57 fully_completed = false 58 break 59 end 60 end 61 62 break if fully_completed 63 sleep(2) 64 logger.debug '[BATCH-PROCESS-LOG] again...' 65 end 66 end 67 68 end 69
here is the usages code -
143 execute_in_multicores(p_total_cores, SomeStuff.count, SomeStuff) do |some_stuffs| 144 # Do.. whatever you wanna do with the stuff here! these are gonna run on multicores! 151 end see it is really simple!_) if you like it let me know! how much you like it
_) here you can find the code on github best wishes!
what killed my time to run my first test using “cucumber”
stucked with “No such file or directory – cucumber.yml” error?
then you must doing the same mistake as i was doing for last 1 hour.
i had the following code in Rakefile
require ‘cucumber/rake/task’
Cucumber::Rake::Task.new do |t|
profile = ENV['PROFILE'] || ‘default’
t.cucumber_opts = “–profile #{profile}”
end
the fix is just keep the following code only -
require ‘cucumber/rake/task’
Cucumber::Rake::Task.new do |t|
end
so are you still facing problem while you are executing “rake features” but it doesn’t come up with any output?
here is the check list – (this list may grow gradually) -
1. do you have features directory
2. do you have features/steps directory
3. lets say you have “features/transfer.feature” file do you know that you must have “features/steps/transfer_steps.rb” file?
4. do you know “feature name” must be prefixed for steps file?
hope this might help you.
ruby dynamic factory method implementation
i was looking for some way out to implement factory method on ruby code. where i have a VersionControl::ServiceFactory which will take different implementation as factory method. ie. VersionControl::ServiceFactory::subversion, VersionControl::ServiceFactory::git and so on.
though all these “subversion, git, perforce” methods are not predefined, these will be added while new implemention is added. my skeleton (abstract) implementation was in VersionControl::Service, so whatever implemention (svn, git, perforce, other) comes ahead it will implement method from VersionControl::Service method. since ruby doesn’t support class abstraction so this was the only way i got in head.
so my skeleton implemention was consist of the following code in summary -
module VersionControl
# log entry class for data object
class LogEntry; end
# local repository information
class Information; end
# abstract service class.
# define service which will be exposed for a
# normal version controlling service.
class Service
# retrieve recent logs from the current directory
# required parameters - *base project path* and other options.
def logs(p_path, p_options = {}); raise "not implemented method" end
# generate diff from mentioned revision number
# or current revision number
# required parameters - *base project path* and other options.
def diff(p_path, p_options = {}); raise "not implemented method" end
# find repository information
def info(p_path, p_options = {}); raise "not implemented method" end
# check out content from version control server
def checkout(p_source, p_path, p_options = {}); raise "not implemented method" end
end
# factory class for supporting different version control implementation
class ServiceFactory; end
end
and the implementation is written in this way -
class ServiceFactory class < < self # add subversion factory method implementation @@subversion_instance = SubversionService.new def subversion return @@subversion_instance end end end
so every implementation also push it’s factory method to “VersionControl::ServiceFactory” class. so this way i have implemented dynamic factory method on ruby.
upcoming project mojar_workflow, workflow engine in ruby
hi,
we just kicking start a new open source ruby based workflow engine project “mojar workflow“.
we named it after our deshi word “mojar” reason is very clear to
spread out this word.
mojar workflow, is integral solution to execute a flow of business
rules. for example -
you have an action where you have the following set of rules -
1. start transaction
2. verify user account
3. verify user balance
4. verify user dues
5. reduce dues from balance
6. complete transaction
after few days you got a new requirement, where you suppose to reduce
user dues by the 10% because of company new discount policy.
so you have to implement the following rules -
1. start transaction
2. verify user account
3. verify user balance
4. verify user dues
5. reduce dues by 10% of discount
5. reduce discounted dues from balance
6. complete transaction
to implement such scenario you have to again code in your stable
release. but using mojar workflow, you can add that new concern from
the abstract flow maintenance layer. where you can define this flow in
yaml file or an xml document.
keep your eyes on -
http://rubyforge.org/projects/mojarworkflow/
best wishes,
simple fragment cache implementation on ruby on rails
i was getting serious performance problem with one of my projects. so i came up with a simple fragment cache implementation on ruby on rails.
after implementing this stuff, i replaced “render(:partial => …)” with the following method -
render_from_cache_or_render(:cache_key =>”cache key”, :cache_expire_after => ConstantHelper::TAG_CLOUD_EXPIRED_IN, # minutes :partial => “….”)
let’s have a look on my implementation -
def render_from_cache_or_render(p_args)
return render(p_args) if true == p_args[:cache_off]
# check from cache
cache_key = p_args[:cache_key]
cached_content = CacheService.get_cache(cache_key)if not cached_content.nil? and not cached_content.empty?
return cached_content
else
content = render(p_args)
# cache expire time if defined
cache_expire_time_in_minutes = p_args[:cache_expire_after] || 60
CacheService.add_cache(cache_key, cache_expire_time_in_minutes, content)
return content
end
end
actually, my implemented “CacheService” class is simply storing all cache in a hash map.
when some cache was requested for peek, cache expiry was checked before returning the cached value.
for CacheService implementation look at the bottom of my post.
anyway, after implementing and utilizing this stuff, i gained 70+ requests capability per second. fyi, before applying cache it was around 10 per second.
module Cache
class Item
attr_accessor :key, :expire_time, :content, :created_ondef initialize(p_key, p_expire_time, p_content)
@key = p_key
@expire_time = p_expire_time * 60 # in minutes
@content = p_content
@created_on = Time.now
end
end
endclass CacheService
@@CACHES = {}
@@CACHE_EXPIRE_TIMES = {}def self.add_cache(p_key, p_expire_time, p_content)
cache_item = Cache::Item.new(p_key, p_expire_time, p_content)
@@CACHES[p_key.to_sym] = cache_item
enddef self.get_cache(p_key)
# load content from cache
cached_content = @@CACHES[p_key.to_sym]
return nil if cached_content.nil?# verify cache validity
return cached_content.content if not expired?(cached_content)
return nil
endprivate
def self.expired?(p_cache)
# find time difference
time_difference = Time.now – p_cache.created_on
return true if time_difference > p_cache.expire_time
end
end
best wishes,
thats why i like ruby!!! thanks dynamic scripting…
if you have rails deployment on windows environment with mongrel service, i think you might face the following problem -
Errno::EINVAL (Invalid argument):
/app/models/index_service.rb:63:in `write’
/app/models/index_service.rb:63:in `puts’
this problem was because of “puts” what i forgot to remove before deploying on test server.
if your deployment on windows service environment and if your code has few “puts” usages, you must face this problem with mongrel
on mongrel group, i found they are working with this, hopefully they will replace puts with logger and other things.
anyway, the quickest solution i had in mind was just use the dynamic behavior of ruby. here is what i did -
def puts(p_args)
logger.debug(p_args)
end
thats all fixed my problem
thank ruby, thanks for dynamic scripting…
simple AOP implementation in ruby
i was suppose to work on some of my other projects, but i passed my time by writing a simple aop implementation in ruby.
it is neither powerful like AspectJ nor comparable with AspectR. however, i was having fun with my day off.
here is how my simple example is running-
def test_aop
# apply pointcuts
apply_advices(:before, /^do_.+/, SimpleService, SimpleServiceAdvice, :before_do)
apply_advices(:around, /^do_.+/, SimpleService, SimpleServiceAdvice, :around_do)
apply_advices(:after, /^do_.+/, SimpleService, SimpleServiceAdvice, :after_do)simple_service = SimpleService.new
simple_service.do_1(“A”)
end
here is the parameter name.
apply_advices(type_of_advice, pointcuts_in_regex, service_class, advice_class, advice_method)
instead of making aspectJ type pointcuts syntax, i have used regex, which is fine for the time being.
if you run this code you will find the following output -
Before execution.
Around {
1 performed – A
}
After execution.
now let’s have a look on my SimpleService class.
class SimpleService
def do_1(p_param)
puts “1 performed – #{p_param}”
enddef do_2
puts “2 performed.”
enddef no_do
puts “No do”
end
end
and here is my aspect class,
class SimpleServiceAdvice
def around_do(p_invoke)
puts “Around { “
output = p_invoke.proceede()
puts “}”
return output
enddef before_do(p_invoke)
puts “Before execution.”
enddef after_do(p_params, p_output)
puts “After execution.”
end
end
my implementation is very straight forward, actually during implementing this stuff, i really felt the strength of meta programming. it is so flexible and so easy that sky is the limit.
Fat Refactoring: use include module to reduce number of lines
if i didn’t mention that before, i should tell it now, here at somewhere in… rnd team we are playing a lot with ruby on rails. these days our rails team is completely focusing on a product(which is secret for the time being
) where we
found a lot of interesting stuffs, for instance.
few days back, we found our application_helper and few controllers are growing too fast and getting extra fat (lines of code). so we had few refactoring to reduce the extra fat.
now have a look on the code we had with in application_helper.rb taken from tag/v-0.3

this code is not completely visible over the screen snap, this is 340 number of lines. which was the output of our 3 iterations.
though these number of lines are not that much problematic, but we had a scenario which was difficult to make it more concern aware and single concerned.
now have a look on our code which is taken from the current trunk,

Wow, now it is 50 lines only including the header copyright information.
the trick was very simple, we followed the following conventions -
1. find out all related and same concerned functions
2. stick team together in a module
3. include the module to statically import all functions
no integration error, nothing has occurred.
we are happy with this
i think, our ruby learning process is going smooth
split out test case into “preparation” and “verification” state.
i was wondering how i could make my test method more organized and DR(Y)ied.
so i had a nice time while i was writing test to ensure my current modification works ok with existing setup.
i don’t know whether any design pattern or best practice is already exist on this topic.
i came up with something that is really good for me. here is a bit about my task and also the explanation about what i did.
i have a model “category” where i have the following relations -
has_many :category_mappings, :dependent => :destroy
has_many :items, :through => :category_mappingshas_many :categories, :foreign_key => “parent_id”, :dependent => :destroy
has_many :attribute_category_mappings, :dependent => :destroy
has_many :properties, :through => :attribute_category_mappingsbelongs_to :category
i was applying :dependent = > :destroy with the mapping model and child categories, which are needs to be removed as a part of the category destroy process.
so i had a messy unit test method to ensure this is working fine. sorry for not keeping my old code snaps otherwise i could show the messy one. anyway here what i wrote later to make it bit cleaner than the previous messy version -
def test_destroy_category_with_dependent
category = Category.find(3)# prepare for verification
prepare_for_verifying_related_property_mappings(category)
prepare_for_verifying_category_mappings(category)
prepare_for_verifying_child_category(category)# perform action
category.destroy# ensure the action
assert_raise(ActiveRecord::RecordNotFound) {
Category.find(category.id)
}# perform verification
verify_related_property_mappings(category)
verify_related_category_mappings(category)
verify_related_child_category(category)
end
here i just sliced my test method in 2 different roles,
1. preparation stage
2. verification stage
1. preparation stage -
in this stage, i just keep log for current state or other steps which are important for later testing.
2. verification stage -
in this stage, i just verified my new state comparing with the old state (which was kept during preparation stage).
so let’s have a look on the typical code which i wrote in my preparation and verification stage -
puts “Prepare for verifying related child category – #{p_category}.”
@old_category_count = Category.count
@old_child_category_count = p_category.categories.count
@old_child_categories = Category.find_all_by_parent_id(p_category.id)
assert_not_equal(0, @old_child_categories.length, “No child category found.”)
end
def verify_related_child_category(p_category)
puts “verify related child categories – #{p_category}”
now_category_count = Category.count
assert_equal(true, ((@old_category_count – now_category_count) > 1),
“No category has been removed.”)
end
this seem pretty good for me, as long as i can make my code bunch simple and easy to change.
best wishes,
on your active record model define has_many with dependent models.
i was refactoring our Item model, where we have 3 has_many with 3 mapping models.
as we are not using InnoDB based foreign key constraint, we were searching some sort of reliable solution,
which will take pressure in application layer instead of leaving it to the database.
so later we introduced “:dependent” with has_may relation. here is our top of Item model.
has_many :category_mappings, :dependent => :destroy
has_many :categories, :through => :category_mappings
has_many :property_values, :dependent => :destroy
has_many :properties, :through => :property_value
has_many :item_location_mappings, :dependent => :destroy
has_many :locations, :through => :item_location_mappings
our “dependent” flagship is destroying all related items in the item destroy process which has introduced
our flexibility and reduced a lot of code to manage such stuff in a DRY(ied) manner.
so the following unit test worked fine for us.

some bad side,
dependent delete each and every item one by one, which is big issue when you have a big chunk of dependent data.
but that is not suppose to be common in every context. we have no problem with this issue.
best of luck!
“work for fun”




