Problem with a script - please help

7 respostas [Última entrada]
lammi87

I am a member!

Desconectado
Joined: 07/27/2012

Hi,

I have a problem with a script which I am writing for my final thesis. The thesis must be ready within two weeks, so I would really appreciate if you could help me. My thesis is about h-node.

The script will deal with two different logs of user actions to figure out what kind of hardware contributions I have made to h-node. Specifically, I am trying to count how many contributions I have made per each hardware class.

I have two source files for the script. Both contain lines of data which I am trying to process. Here are some examples of the content:

Source file 1:
Format: the model DEVICE_NAME has been [updated|inserted] by USER_NAME at DATE_AND_TIME


the model E220 HSDPA Modem / E270 HSDPA/HSUPA Modem has been updated by lammi87 at 22:48, 18 May 2012

Source file 2:
Format: HARDWARE_CLASS DEVICE_NAME


notebook ProBook 6555b

Source file 1 contains most of the stuff I need. The only thing it does not have is the HARDWARE_CLASS associated with each line, so I have to get it from the source file 2.

Source file 1 has a line per each contribution I have made but the source file 2 does not. Even when I make multiple contributions to one device, only one line containing that DEVICE_NAME is found in the source file 2.

So, I need to count how many times a DEVICE_NAME found in source 2 appears on source 1 per each HARDWARE_CLASS.

The problem is this:

I don't seem to get the array in my script to work correctly. I am trying to save all the DEVICE_NAMES (which can contain white spaces and special characters like ', (, ), [ and ] and extra tabs and white spaces at the end of each line) to the array as one DEVICE_NAME per each slot in the array by running a for loop.

Then I will run another for loop within the other one to count how many times a line containing a particular DEVICE_NAME is present in source 1. The problem is that no matter what I try, I can't get the array to work.

Sorry for long explanation. Example source files and the script can be found as attachments below.

AnexoTamaño
script.sh819 bytes
Source_file_1.txt558 bytes
Source_file_2.txt287 bytes
lammi87

I am a member!

Desconectado
Joined: 07/27/2012

Here's the script again.

AnexoTamaño
script.txt 819 bytes
Michał Masłowski

I am a member!

I am a translator!

Desconectado
Joined: 05/15/2010

http://mtjm.eu/patches/script.diff shows one possibility of fixing it.

DEVICE_NAMES is not an array; the extra trailing whitespace makes the
match fail.

cat ... | grep ... | sed ... can be replaced by one invocation of sed;
-r makes it use the more common extended regexp syntax.

Device names contain special characters, so they need shell quoting and
matching via fgrep. Since fgrep searches for lines matching any pattern
line, the inner loop can be simplified.

bash -ex is very useful when debugging such scripts.

lammi87

I am a member!

Desconectado
Joined: 07/27/2012

Thank you very much for your answer. It works better than anything that I have done. However, I noticed a problem. After running the updated script with my real source files 1 and 2, I get a total value of 730 contributions. This does not match with a result I get from a different part of the script.

I calculate the total value of contributions made by me by counting the lines from source 1 which contain my user name. When I do that, I get 912 as the result. These don't match. Do you have any idea why?

Again, thanks for your help, but could you spare a little more of your time, please?

AnexoTamaño
script.txt 828 bytes
source-hardware-1.txt 147.43 KB
source-hardware-2.txt 22.83 KB
Michał Masłowski

I am a member!

I am a translator!

Desconectado
Joined: 05/15/2010

There is no check for the contributions being yours, there are 730 your
contributions and 912 (or 913?) contributions of all users with devices
listed in the other file.

I've implemented http://mtjm.eu/patches/ccontr.py to check that this
isn't a shell escape or grep issue (it outputs to standard output
instead of using the third argument).

lammi87

I am a member!

Desconectado
Joined: 07/27/2012

I can check whether any single contribution was made by me by using my user name:

I'll just change this:

let "NUM=$(fgrep -c "$DEVICE_NAMES" $1 )"

into this

let "NUM=$(grep -F "$DEVICE_NAMES" $1 | grep -c "lammi87" )"

It gave me 728. I'd say it is sufficient enough. I'll just have to point out that the figures I get with my scripts are not 100% accurate but accurate enough.

Thanks for your help! You helped a lot!

lembas
Desconectado
Joined: 05/13/2010

>My thesis is about h-node.

That sounds interesting, would be nice to read it once you're done.

lammi87

I am a member!

Desconectado
Joined: 07/27/2012

That's nice to hear. I'll be sure to post it here when it is done and has been uploaded to my school's thesis system. It should be around three weeks.

My thesis aims to be an introduction to h-node for new users and spread knowledge about it and free software too. I also aim to extend h-node's hardware database. So nothing ground breaking in developing new features for h-node, I'm afraid.